GLM & GAM

Generalized Linear Model (GLM)

GLM estimates regression models for outcomes following exponential family distributions. In addition to Gaussian (normal) regression, GLM covers binomial (logistic), multinomial, Poisson, Gamma, Tweedie, and other distributions. H2O’s GLM includes elastic net regularization (a combination of L1/lasso and L2/ridge penalties), making it effective for high-dimensional, sparse data. MOJO Support: GLM supports importing and exporting MOJOs.

Supported Distribution Families

Family	Response Type	Typical Use Case
`gaussian`	Numeric	Standard regression
`binomial`	Binary (0/1)	Logistic regression / binary classification
`quasibinomial`	Proportion [0,1]	Rates and proportions
`fractional_binomial`	Fraction [0,1]	Bounded continuous outcomes
`multinomial`	Multi-class	Multiclass classification
`poisson`	Non-negative integer	Count data
`gamma`	Positive numeric	Insurance claim severity, duration
`tweedie`	Non-negative numeric	Mixed zero/positive outcomes (insurance)
`negative_binomial`	Non-negative integer	Overdispersed count data
`ordinal`	Ordered categories	Ordinal classification

Generalized Additive Model (GAM)

GAM is a type of GLM where the linear predictor includes smooth functions of one or more predictor variables. H2O’s GAM implementation is based on Simon N. Wood’s “Generalized Additive Models: An Introduction with R.” GAM is useful when the relationship between a predictor and the response is non-linear but you want interpretable, smooth curves rather than a black-box model.

GAM models are currently experimental in H2O-3. GAM inherits all GLM parameters and adds spline-specific controls.

MOJO Support: GAM supports importing and exporting MOJOs.

Key Parameters

GLM Parameters

family

str

default:"auto"

Distribution family. One of: "gaussian", "binomial", "quasibinomial", "fractional_binomial", "multinomial", "poisson", "gamma", "tweedie", "negative_binomial", "ordinal". Set to "auto" to infer from the response column type.

alpha

List[float]

default:"[0.5]"

Elastic net mixing parameter array. alpha=0 gives ridge (L2-only), alpha=1 gives lasso (L1-only). Values between 0 and 1 blend both penalties. Pass a list of values to perform a regularization path search.

lambda_

List[float]

default:"(computed)"

Regularization strength. Larger values produce more regularization. If not specified, H2O computes a regularization path automatically. Enable lambda_search=True to search across a path of lambda values automatically.

lambda_search

bool

default:"False"

When True, H2O performs a full regularization path search from lambda_max to lambda_min. Highly recommended when you don’t know the right regularization strength.

solver

str

default:"AUTO"

Optimization algorithm: "AUTO", "IRLSM" (Iteratively Reweighted Least Squares, good for small/medium wide data), "L_BFGS" (large sparse), "COORDINATE_DESCENT" (multi-threaded CD, best for large data), "COORDINATE_DESCENT_NAIVE", "GRADIENT_DESCENT_LH", "GRADIENT_DESCENT_SQERR".

standardize

bool

default:"True"

Standardize numeric columns before fitting. Strongly recommended when using regularization so that coefficients are on a comparable scale.

remove_collinear_columns

bool

default:"False"

Automatically drop collinear columns. Useful when multicollinearity would otherwise prevent convergence.

compute_p_values

bool

default:"False"

Compute p-values and standard errors for coefficients. Only available for models without regularization (lambda_=0).

tweedie_variance_power

float

default:"0.0"

(Only for family="tweedie") The variance power p. Common values: 1.0 = Poisson, 1.5 = compound Poisson-Gamma, 2.0 = Gamma.

GAM Parameters

gam_columns

List[str]

required

Column names to use as smoothing terms. GAM builds a spline smoother for each column listed. Required for GAM models.

List[int]

default:"[0]"

Spline type per GAM column: 0 = cubic regression spline (default), 1 = thin plate regression with knots, 2 = monotone I-splines, 3 = NBSplineTypeI M-splines.

num_knots

List[int]

Number of knots for each GAM predictor listed in gam_columns. One value per column.

scale

List[float]

Smoothing parameter for each GAM predictor. Must be the same length as gam_columns.

Code Examples

import h2o
from h2o.estimators.glm import H2OGeneralizedLinearEstimator

h2o.init()

train = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/higgs/higgs_train_5k.csv")
test  = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/higgs/higgs_test_5k.csv")

y = "response"
x = [c for c in train.columns if c != y]
train[y] = train[y].asfactor()
test[y]  = test[y].asfactor()

glm = H2OGeneralizedLinearEstimator(
    family="binomial",
    alpha=0.5,           # elastic net: mix of L1 and L2
    lambda_search=True,  # search for best lambda automatically
    standardize=True,
    seed=42
)
glm.train(x=x, y=y, training_frame=train, validation_frame=test)

# Coefficients
print(glm.coef())
print(glm.auc(valid=True))

Regularization Paths

GLM supports full elastic net regularization paths. The path runs from a fully regularized model (all coefficients zero) to the least regularized model, selecting the optimal lambda via cross-validation or a validation frame.

Python

# Enable lambda search to automatically find optimal regularization
glm = H2OGeneralizedLinearEstimator(
    family="binomial",
    alpha=0.5,
    lambda_search=True,
    nlambdas=100,          # number of lambda values to try
    lambda_min_ratio=1e-4, # ratio of smallest to largest lambda
)
glm.train(x=x, y=y, training_frame=train)

# Optimal lambda selected
print("Best lambda:", glm.actual_params["lambda"])

# Full scoring history across the regularization path
sh = glm.scoring_history()

For high-dimensional data (many predictors), use alpha=1.0 (lasso) with lambda_search=True for automatic feature selection — coefficients for irrelevant features are driven to exactly zero. For highly correlated features, alpha=0.0 (ridge) or an intermediate alpha value often works better.

Get Started

Core Concepts

Algorithms

Model Workflows

Deployment

Generalized Linear Model (GLM)

Supported Distribution Families

Generalized Additive Model (GAM)

Key Parameters

GLM Parameters

GAM Parameters

Code Examples

Regularization Paths

Build docs developers (and LLMs) love

Get Started

Core Concepts

Algorithms

Model Workflows

Deployment

​Generalized Linear Model (GLM)

​Supported Distribution Families

​Generalized Additive Model (GAM)

​Key Parameters

​GLM Parameters

​GAM Parameters

​Code Examples

​Regularization Paths

Build docs developers (and LLMs) love

Generalized Linear Model (GLM)

Supported Distribution Families

Generalized Additive Model (GAM)

Key Parameters

GLM Parameters

GAM Parameters

Code Examples

Regularization Paths