Skip to main content
H2O-3 supports two types of grid search: Cartesian (exhaustive) and Random. Both search strategies produce an H2OGridSearch object (Python) / H2OGrid object (R) that stores all trained models and their metrics.

Cartesian grid search example

import h2o
from h2o.estimators.gbm import H2OGradientBoostingEstimator
from h2o.grid.grid_search import H2OGridSearch
h2o.init()

# Import data
data = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/higgs/higgs_train_10k.csv")
test = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/higgs/higgs_test_5k.csv")

y = "response"
x = data.columns
x.remove(y)

data[y] = data[y].asfactor()
test[y] = test[y].asfactor()

ss = data.split_frame(seed=1)
train = ss[0]
valid = ss[1]

# Define hyperparameter space
gbm_params = {
    'learn_rate': [0.01, 0.1],
    'max_depth': [3, 5, 9],
    'sample_rate': [0.8, 1.0],
    'col_sample_rate': [0.2, 0.5, 1.0]
}

# Train cartesian grid (all 2×3×2×3 = 36 combinations)
gbm_grid = H2OGridSearch(
    model=H2OGradientBoostingEstimator,
    grid_id='gbm_grid1',
    hyper_params=gbm_params
)
gbm_grid.train(x=x, y=y, training_frame=train, validation_frame=valid, ntrees=100, seed=1)

# Sort results by validation AUC
gbm_gridperf = gbm_grid.get_grid(sort_by='auc', decreasing=True)
gbm_gridperf

Random grid search example

# Larger hyperparameter space, explored with random search
gbm_params2 = {
    'learn_rate': [i * 0.01 for i in range(1, 11)],
    'max_depth': list(range(2, 11)),
    'sample_rate': [i * 0.1 for i in range(5, 11)],
    'col_sample_rate': [i * 0.1 for i in range(1, 11)]
}

# Stop after 36 models
search_criteria = {'strategy': 'RandomDiscrete', 'max_models': 36, 'seed': 1}

gbm_grid2 = H2OGridSearch(
    model=H2OGradientBoostingEstimator,
    grid_id='gbm_grid2',
    hyper_params=gbm_params2,
    search_criteria=search_criteria
)
gbm_grid2.train(x=x, y=y, training_frame=train, validation_frame=valid, ntrees=100, seed=1)

gbm_gridperf2 = gbm_grid2.get_grid(sort_by='auc', decreasing=True)
gbm_gridperf2

Search criteria

The search_criteria parameter (Python: dict, R: named list) controls the search strategy and stopping conditions.

Strategy options

StrategyDescription
"Cartesian"Default. Trains all combinations of the hyperparameter space.
"RandomDiscrete"Randomly samples from the hyperparameter space. Requires at least one stopping criterion.
"Sequential"Iterates through parameter lists in sequence (all lists must have the same length).

Stopping criteria examples

# Stop after 10 models
{'strategy': 'RandomDiscrete', 'max_models': 10, 'seed': 1}

# Stop after 1 hour
{'strategy': 'RandomDiscrete', 'max_runtime_secs': 3600}

# Stop after 42 models or 8 hours, whichever comes first
{'strategy': 'RandomDiscrete', 'max_models': 42, 'max_runtime_secs': 28800}

# Stop when the metric stops improving
{'strategy': 'RandomDiscrete', 'stopping_tolerance': 0.001, 'stopping_rounds': 10}

# Stop when misclassification stops improving (more sensitive threshold)
{'strategy': 'RandomDiscrete', 'stopping_metric': 'misclassification',
 'stopping_tolerance': 0.0005, 'stopping_rounds': 5}
Combine max_models and max_runtime_secs to bound grid search by both count and time. Either condition will trigger early termination when reached.

Getting the best model

1

Sort the grid by a metric

Call get_grid() (Python) or h2o.getGrid() (R) with sort_by to rank models.
sorted_grid = gbm_grid.get_grid(sort_by='auc', decreasing=True)
2

Retrieve the best model

The top model is the first entry in the sorted result.
# Best model (highest AUC)
best_model = sorted_grid.models[0]
3

Evaluate on a holdout test set

Evaluate the best model on a fresh test set to get an unbiased performance estimate.
best_perf = best_model.model_performance(test)
print(best_perf.auc())
# 0.7781778619721595

Parallelism

The parallelism parameter controls how many models are built simultaneously on the H2O leader node.
ValueBehavior
1Sequential building (default).
0Adaptive — H2O decides based on available resources.
N > 1Build exactly N models in parallel.
Parallel grid search speeds things up mainly for small datasets on large clusters. For big data, sequential scheduling yields the highest performance since each model training already uses the full cluster.

Saving and loading a grid

H2O-3 can save and reload a grid even after a cluster restart.

Auto-checkpointing

import tempfile
checkpoints_dir = tempfile.mkdtemp()

gbm_grid = H2OGridSearch(
    model=H2OGradientBoostingEstimator,
    grid_id='gbm_grid',
    hyper_params=gbm_params,
    export_checkpoints_dir=checkpoints_dir
)
gbm_grid.train(x=x, y=y, training_frame=train, validation_frame=valid, ntrees=100, seed=1)

grid_id = gbm_grid.grid_id

# After a cluster restart, reload the grid
h2o.remove_all()
grid = h2o.load_grid(checkpoints_dir + "/" + grid_id)

Manual save and load

# Save grid manually
saved_path = h2o.save_grid(checkpoints_dir, gbm_grid.grid_id)

# Reload after cluster restart
h2o.remove_all()
grid = h2o.load_grid(saved_path)
Set recovery_dir to enable automatic progress recovery if the cluster crashes mid-training. On restart, reload the grid with load_params_references=True and resume training with the same grid_id and hyperparameter spec.
gbm_grid = H2OGridSearch(
    model=H2OGradientBoostingEstimator,
    grid_id='gbm_grid',
    hyper_params=hyper_parameters,
    recovery_dir="hdfs://nameNode/user/john/gbm_grid_recovery"
)
gbm_grid.train(x=list(range(4)), y=4, training_frame=iris)

# On a new cluster, resume from last successful model
grid = h2o.load_grid(
    "hdfs://nameNode/user/john/gbm_grid_recovery/gbm_grid",
    load_params_references=True
)
train = h2o.get_frame("iris")
grid.hyper_params = hyper_parameters
grid.train(x=list(range(4)), y=4, training_frame=train)

Build docs developers (and LLMs) love