Generalized Linear Model (GLM)
GLM estimates regression models for outcomes following exponential family distributions. In addition to Gaussian (normal) regression, GLM covers binomial (logistic), multinomial, Poisson, Gamma, Tweedie, and other distributions. H2O’s GLM includes elastic net regularization (a combination of L1/lasso and L2/ridge penalties), making it effective for high-dimensional, sparse data. MOJO Support: GLM supports importing and exporting MOJOs.Supported Distribution Families
| Family | Response Type | Typical Use Case |
|---|---|---|
gaussian | Numeric | Standard regression |
binomial | Binary (0/1) | Logistic regression / binary classification |
quasibinomial | Proportion [0,1] | Rates and proportions |
fractional_binomial | Fraction [0,1] | Bounded continuous outcomes |
multinomial | Multi-class | Multiclass classification |
poisson | Non-negative integer | Count data |
gamma | Positive numeric | Insurance claim severity, duration |
tweedie | Non-negative numeric | Mixed zero/positive outcomes (insurance) |
negative_binomial | Non-negative integer | Overdispersed count data |
ordinal | Ordered categories | Ordinal classification |
Generalized Additive Model (GAM)
GAM is a type of GLM where the linear predictor includes smooth functions of one or more predictor variables. H2O’s GAM implementation is based on Simon N. Wood’s “Generalized Additive Models: An Introduction with R.” GAM is useful when the relationship between a predictor and the response is non-linear but you want interpretable, smooth curves rather than a black-box model.GAM models are currently experimental in H2O-3. GAM inherits all GLM parameters and adds spline-specific controls.
Key Parameters
GLM Parameters
Distribution family. One of:
"gaussian", "binomial", "quasibinomial", "fractional_binomial", "multinomial", "poisson", "gamma", "tweedie", "negative_binomial", "ordinal". Set to "auto" to infer from the response column type.Elastic net mixing parameter array.
alpha=0 gives ridge (L2-only), alpha=1 gives lasso (L1-only). Values between 0 and 1 blend both penalties. Pass a list of values to perform a regularization path search.Regularization strength. Larger values produce more regularization. If not specified, H2O computes a regularization path automatically. Enable
lambda_search=True to search across a path of lambda values automatically.When
True, H2O performs a full regularization path search from lambda_max to lambda_min. Highly recommended when you don’t know the right regularization strength.Optimization algorithm:
"AUTO", "IRLSM" (Iteratively Reweighted Least Squares, good for small/medium wide data), "L_BFGS" (large sparse), "COORDINATE_DESCENT" (multi-threaded CD, best for large data), "COORDINATE_DESCENT_NAIVE", "GRADIENT_DESCENT_LH", "GRADIENT_DESCENT_SQERR".Standardize numeric columns before fitting. Strongly recommended when using regularization so that coefficients are on a comparable scale.
Automatically drop collinear columns. Useful when multicollinearity would otherwise prevent convergence.
Compute p-values and standard errors for coefficients. Only available for models without regularization (
lambda_=0).(Only for
family="tweedie") The variance power p. Common values: 1.0 = Poisson, 1.5 = compound Poisson-Gamma, 2.0 = Gamma.GAM Parameters
Column names to use as smoothing terms. GAM builds a spline smoother for each column listed. Required for GAM models.
Spline type per GAM column:
0 = cubic regression spline (default), 1 = thin plate regression with knots, 2 = monotone I-splines, 3 = NBSplineTypeI M-splines.Number of knots for each GAM predictor listed in
gam_columns. One value per column.Smoothing parameter for each GAM predictor. Must be the same length as
gam_columns.Code Examples
Regularization Paths
GLM supports full elastic net regularization paths. The path runs from a fully regularized model (all coefficients zero) to the least regularized model, selecting the optimal lambda via cross-validation or a validation frame.Python