H2OAutoML automates the supervised machine learning model training process. It trains multiple models — including GBM, XGBoost, GLM, DRF, XRT, DeepLearning, and Stacked Ensembles — cross-validates them by default, and ranks them on a leaderboard.
Constructor
Stopping criteria
Maximum total time (in seconds) for the AutoML run. When both
max_runtime_secs and max_models are specified, AutoML stops when either limit is reached. If neither is specified, defaults to 3600 seconds (1 hour).Maximum time in seconds dedicated to each individual model.
0 disables the per-model limit.Maximum number of models to build, excluding Stacked Ensemble models. Set this parameter to ensure reproducibility: all models will be trained until convergence and none will be constrained by a time budget.
Metric used for early stopping during the AutoML run.
"AUTO" resolves to "logloss" for classification and "deviance" for regression. Other options: "mse", "rmse", "mae", "rmsle", "auc", "aucpr", "misclassification", "mean_per_class_error", "r2".Relative tolerance for the metric-based stopping criterion. Defaults to
0.001 for datasets with at least 1 million rows; otherwise computed from dataset size.Stop training new models when the stopping metric has not improved for this many consecutive models.
0 disables this check.Cross-validation
Number of folds for k-fold cross-validation.
-1 lets AutoML decide (uses 5-fold CV or a blending frame depending on dataset size). 0 disables CV. Minimum value when setting explicitly is 2.Retain cross-validation predictions. Required when continuing an existing AutoML project with repeated
train() calls.Retain cross-validation sub-models. Keeping them consumes more cluster memory.
Algorithm selection
Algorithms to skip. Available values:
"DRF", "GLM", "XGBoost", "GBM", "DeepLearning", "StackedEnsemble". Cannot be combined with include_algos.Restrict AutoML to only these algorithms. Cannot be combined with
exclude_algos.Class balancing
Oversample minority classes to balance the class distribution. Only applicable for classification.
Desired over/under-sampling ratios per class (in lexicographic order). Requires
balance_classes=True. Auto-computed if not specified.Maximum relative size of the training dataset after class balancing. Majority classes are undersampled if the oversampled size exceeds this limit.
Reproducibility and project management
Random seed. AutoML guarantees reproducibility only when
max_models or early stopping is used, because max_runtime_secs is resource-dependent.Name for this AutoML project. Auto-generated from the training frame ID if not specified. Reuse the same name to continue training additional models on an existing project.
Leaderboard
Metric used to sort the leaderboard.
"AUTO" resolves to "auc" (binomial), "mean_per_class_error" (multinomial), or "deviance" (regression).Other
Distribution family used by supporting algorithms. Options:
"AUTO", "bernoulli", "multinomial", "gaussian", "poisson", "gamma", "tweedie", "laplace", "quantile", "huber", "custom". Parameterized distributions can be passed as a dict, e.g., dict(type="tweedie", tweedie_power=1.5).Verbosity of backend messages during training. One of
None, "debug", "info", "warn", "error".Methods
train
Predictor column names or indices. When
None, all columns except y are used.Response column name or index.
Training dataset.
Column with pre-assigned cross-validation fold indices.
Column containing per-row observation weights.
Validation dataset. Only used when
nfolds=0. When cross-validation is active, cross-validation metrics take precedence for early stopping.Test dataset used to score the leaderboard. When omitted, cross-validation metrics are used.
Frame used for training Stacked Ensemble metalearners (blending mode). Only used when
nfolds=0.predict
Data to score.
get_leaderboard
Extra metric columns to include. Pass
"ALL" for all available metrics, or a list of specific metric names.Properties
leaderboard
model_id and the primary evaluation metric.
leader
predict(), model_performance(), varimp()).
project_name
h2o.automl.get_automl(project_name).