H2O-3 supports two types of grid search: Cartesian (exhaustive) and Random. Both search strategies produce an H2OGridSearch object (Python) / H2OGrid object (R) that stores all trained models and their metrics.
Cartesian vs random search
Cartesian search
Random search
Sequential search
Cartesian search trains a model for every combination of the specified hyperparameter values. If you specify 2 values for learn_rate, 3 for max_depth, and 3 for col_sample_rate, cartesian search trains 2 × 3 × 3 = 18 models.Use cartesian search when the hyperparameter space is small enough to explore exhaustively.
Random search samples uniformly from all possible hyperparameter combinations. You control when it stops using max_models, max_runtime_secs, or metric-based early stopping criteria.Use random search when the hyperparameter space is large and exhaustive search would be too slow.
Sequential search iterates through parameter lists in order (all lists must have the same length). Supports early_stopping to halt when performance plateaus.
Cartesian grid search example
import h2o
from h2o.estimators.gbm import H2OGradientBoostingEstimator
from h2o.grid.grid_search import H2OGridSearch
h2o.init()
# Import data
data = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/higgs/higgs_train_10k.csv")
test = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/higgs/higgs_test_5k.csv")
y = "response"
x = data.columns
x.remove(y)
data[y] = data[y].asfactor()
test[y] = test[y].asfactor()
ss = data.split_frame(seed=1)
train = ss[0]
valid = ss[1]
# Define hyperparameter space
gbm_params = {
'learn_rate': [0.01, 0.1],
'max_depth': [3, 5, 9],
'sample_rate': [0.8, 1.0],
'col_sample_rate': [0.2, 0.5, 1.0]
}
# Train cartesian grid (all 2×3×2×3 = 36 combinations)
gbm_grid = H2OGridSearch(
model=H2OGradientBoostingEstimator,
grid_id='gbm_grid1',
hyper_params=gbm_params
)
gbm_grid.train(x=x, y=y, training_frame=train, validation_frame=valid, ntrees=100, seed=1)
# Sort results by validation AUC
gbm_gridperf = gbm_grid.get_grid(sort_by='auc', decreasing=True)
gbm_gridperf
Random grid search example
# Larger hyperparameter space, explored with random search
gbm_params2 = {
'learn_rate': [i * 0.01 for i in range(1, 11)],
'max_depth': list(range(2, 11)),
'sample_rate': [i * 0.1 for i in range(5, 11)],
'col_sample_rate': [i * 0.1 for i in range(1, 11)]
}
# Stop after 36 models
search_criteria = {'strategy': 'RandomDiscrete', 'max_models': 36, 'seed': 1}
gbm_grid2 = H2OGridSearch(
model=H2OGradientBoostingEstimator,
grid_id='gbm_grid2',
hyper_params=gbm_params2,
search_criteria=search_criteria
)
gbm_grid2.train(x=x, y=y, training_frame=train, validation_frame=valid, ntrees=100, seed=1)
gbm_gridperf2 = gbm_grid2.get_grid(sort_by='auc', decreasing=True)
gbm_gridperf2
Search criteria
The search_criteria parameter (Python: dict, R: named list) controls the search strategy and stopping conditions.
Strategy options
| Strategy | Description |
|---|
"Cartesian" | Default. Trains all combinations of the hyperparameter space. |
"RandomDiscrete" | Randomly samples from the hyperparameter space. Requires at least one stopping criterion. |
"Sequential" | Iterates through parameter lists in sequence (all lists must have the same length). |
Stopping criteria examples
# Stop after 10 models
{'strategy': 'RandomDiscrete', 'max_models': 10, 'seed': 1}
# Stop after 1 hour
{'strategy': 'RandomDiscrete', 'max_runtime_secs': 3600}
# Stop after 42 models or 8 hours, whichever comes first
{'strategy': 'RandomDiscrete', 'max_models': 42, 'max_runtime_secs': 28800}
# Stop when the metric stops improving
{'strategy': 'RandomDiscrete', 'stopping_tolerance': 0.001, 'stopping_rounds': 10}
# Stop when misclassification stops improving (more sensitive threshold)
{'strategy': 'RandomDiscrete', 'stopping_metric': 'misclassification',
'stopping_tolerance': 0.0005, 'stopping_rounds': 5}
Combine max_models and max_runtime_secs to bound grid search by both count and time. Either condition will trigger early termination when reached.
Getting the best model
Sort the grid by a metric
Call get_grid() (Python) or h2o.getGrid() (R) with sort_by to rank models.sorted_grid = gbm_grid.get_grid(sort_by='auc', decreasing=True)
Retrieve the best model
The top model is the first entry in the sorted result.# Best model (highest AUC)
best_model = sorted_grid.models[0]
Evaluate on a holdout test set
Evaluate the best model on a fresh test set to get an unbiased performance estimate.best_perf = best_model.model_performance(test)
print(best_perf.auc())
# 0.7781778619721595
Parallelism
The parallelism parameter controls how many models are built simultaneously on the H2O leader node.
| Value | Behavior |
|---|
1 | Sequential building (default). |
0 | Adaptive — H2O decides based on available resources. |
N > 1 | Build exactly N models in parallel. |
Parallel grid search speeds things up mainly for small datasets on large clusters. For big data, sequential scheduling yields the highest performance since each model training already uses the full cluster.
Saving and loading a grid
H2O-3 can save and reload a grid even after a cluster restart.
Auto-checkpointing
import tempfile
checkpoints_dir = tempfile.mkdtemp()
gbm_grid = H2OGridSearch(
model=H2OGradientBoostingEstimator,
grid_id='gbm_grid',
hyper_params=gbm_params,
export_checkpoints_dir=checkpoints_dir
)
gbm_grid.train(x=x, y=y, training_frame=train, validation_frame=valid, ntrees=100, seed=1)
grid_id = gbm_grid.grid_id
# After a cluster restart, reload the grid
h2o.remove_all()
grid = h2o.load_grid(checkpoints_dir + "/" + grid_id)
Manual save and load
# Save grid manually
saved_path = h2o.save_grid(checkpoints_dir, gbm_grid.grid_id)
# Reload after cluster restart
h2o.remove_all()
grid = h2o.load_grid(saved_path)
Fault-tolerant grid search
Set recovery_dir to enable automatic progress recovery if the cluster crashes mid-training. On restart, reload the grid with load_params_references=True and resume training with the same grid_id and hyperparameter spec.
gbm_grid = H2OGridSearch(
model=H2OGradientBoostingEstimator,
grid_id='gbm_grid',
hyper_params=hyper_parameters,
recovery_dir="hdfs://nameNode/user/john/gbm_grid_recovery"
)
gbm_grid.train(x=list(range(4)), y=4, training_frame=iris)
# On a new cluster, resume from last successful model
grid = h2o.load_grid(
"hdfs://nameNode/user/john/gbm_grid_recovery/gbm_grid",
load_params_references=True
)
train = h2o.get_frame("iris")
grid.hyper_params = hyper_parameters
grid.train(x=list(range(4)), y=4, training_frame=train)