Grid Search & Hyperparameter Tuning

H2O-3 supports two types of grid search: Cartesian (exhaustive) and Random. Both search strategies produce an H2OGridSearch object (Python) / H2OGrid object (R) that stores all trained models and their metrics.

Cartesian vs random search

Cartesian search
Random search
Sequential search

Cartesian search trains a model for every combination of the specified hyperparameter values. If you specify 2 values for learn_rate, 3 for max_depth, and 3 for col_sample_rate, cartesian search trains 2 × 3 × 3 = 18 models.Use cartesian search when the hyperparameter space is small enough to explore exhaustively.

Random search samples uniformly from all possible hyperparameter combinations. You control when it stops using max_models, max_runtime_secs, or metric-based early stopping criteria.Use random search when the hyperparameter space is large and exhaustive search would be too slow.

Sequential search iterates through parameter lists in order (all lists must have the same length). Supports early_stopping to halt when performance plateaus.

Cartesian grid search example

import h2o
from h2o.estimators.gbm import H2OGradientBoostingEstimator
from h2o.grid.grid_search import H2OGridSearch
h2o.init()

# Import data
data = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/higgs/higgs_train_10k.csv")
test = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/higgs/higgs_test_5k.csv")

y = "response"
x = data.columns
x.remove(y)

data[y] = data[y].asfactor()
test[y] = test[y].asfactor()

ss = data.split_frame(seed=1)
train = ss[0]
valid = ss[1]

# Define hyperparameter space
gbm_params = {
    'learn_rate': [0.01, 0.1],
    'max_depth': [3, 5, 9],
    'sample_rate': [0.8, 1.0],
    'col_sample_rate': [0.2, 0.5, 1.0]
}

# Train cartesian grid (all 2×3×2×3 = 36 combinations)
gbm_grid = H2OGridSearch(
    model=H2OGradientBoostingEstimator,
    grid_id='gbm_grid1',
    hyper_params=gbm_params
)
gbm_grid.train(x=x, y=y, training_frame=train, validation_frame=valid, ntrees=100, seed=1)

# Sort results by validation AUC
gbm_gridperf = gbm_grid.get_grid(sort_by='auc', decreasing=True)
gbm_gridperf

Random grid search example

# Larger hyperparameter space, explored with random search
gbm_params2 = {
    'learn_rate': [i * 0.01 for i in range(1, 11)],
    'max_depth': list(range(2, 11)),
    'sample_rate': [i * 0.1 for i in range(5, 11)],
    'col_sample_rate': [i * 0.1 for i in range(1, 11)]
}

# Stop after 36 models
search_criteria = {'strategy': 'RandomDiscrete', 'max_models': 36, 'seed': 1}

gbm_grid2 = H2OGridSearch(
    model=H2OGradientBoostingEstimator,
    grid_id='gbm_grid2',
    hyper_params=gbm_params2,
    search_criteria=search_criteria
)
gbm_grid2.train(x=x, y=y, training_frame=train, validation_frame=valid, ntrees=100, seed=1)

gbm_gridperf2 = gbm_grid2.get_grid(sort_by='auc', decreasing=True)
gbm_gridperf2

Search criteria

The search_criteria parameter (Python: dict, R: named list) controls the search strategy and stopping conditions.

Strategy options

Strategy	Description
`"Cartesian"`	Default. Trains all combinations of the hyperparameter space.
`"RandomDiscrete"`	Randomly samples from the hyperparameter space. Requires at least one stopping criterion.
`"Sequential"`	Iterates through parameter lists in sequence (all lists must have the same length).

Stopping criteria examples

# Stop after 10 models
{'strategy': 'RandomDiscrete', 'max_models': 10, 'seed': 1}

# Stop after 1 hour
{'strategy': 'RandomDiscrete', 'max_runtime_secs': 3600}

# Stop after 42 models or 8 hours, whichever comes first
{'strategy': 'RandomDiscrete', 'max_models': 42, 'max_runtime_secs': 28800}

# Stop when the metric stops improving
{'strategy': 'RandomDiscrete', 'stopping_tolerance': 0.001, 'stopping_rounds': 10}

# Stop when misclassification stops improving (more sensitive threshold)
{'strategy': 'RandomDiscrete', 'stopping_metric': 'misclassification',
 'stopping_tolerance': 0.0005, 'stopping_rounds': 5}

Combine max_models and max_runtime_secs to bound grid search by both count and time. Either condition will trigger early termination when reached.

Getting the best model

Sort the grid by a metric

Call get_grid() (Python) or h2o.getGrid() (R) with sort_by to rank models.

sorted_grid = gbm_grid.get_grid(sort_by='auc', decreasing=True)

Retrieve the best model

The top model is the first entry in the sorted result.

# Best model (highest AUC)
best_model = sorted_grid.models[0]

Evaluate on a holdout test set

Evaluate the best model on a fresh test set to get an unbiased performance estimate.

best_perf = best_model.model_performance(test)
print(best_perf.auc())
# 0.7781778619721595

Parallelism

The parallelism parameter controls how many models are built simultaneously on the H2O leader node.

Value	Behavior
`1`	Sequential building (default).
`0`	Adaptive — H2O decides based on available resources.
`N > 1`	Build exactly N models in parallel.

Parallel grid search speeds things up mainly for small datasets on large clusters. For big data, sequential scheduling yields the highest performance since each model training already uses the full cluster.

Saving and loading a grid

H2O-3 can save and reload a grid even after a cluster restart.

Auto-checkpointing

import tempfile
checkpoints_dir = tempfile.mkdtemp()

gbm_grid = H2OGridSearch(
    model=H2OGradientBoostingEstimator,
    grid_id='gbm_grid',
    hyper_params=gbm_params,
    export_checkpoints_dir=checkpoints_dir
)
gbm_grid.train(x=x, y=y, training_frame=train, validation_frame=valid, ntrees=100, seed=1)

grid_id = gbm_grid.grid_id

# After a cluster restart, reload the grid
h2o.remove_all()
grid = h2o.load_grid(checkpoints_dir + "/" + grid_id)

Manual save and load

# Save grid manually
saved_path = h2o.save_grid(checkpoints_dir, gbm_grid.grid_id)

# Reload after cluster restart
h2o.remove_all()
grid = h2o.load_grid(saved_path)

Fault-tolerant grid search

Set recovery_dir to enable automatic progress recovery if the cluster crashes mid-training. On restart, reload the grid with load_params_references=True and resume training with the same grid_id and hyperparameter spec.

gbm_grid = H2OGridSearch(
    model=H2OGradientBoostingEstimator,
    grid_id='gbm_grid',
    hyper_params=hyper_parameters,
    recovery_dir="hdfs://nameNode/user/john/gbm_grid_recovery"
)
gbm_grid.train(x=list(range(4)), y=4, training_frame=iris)

# On a new cluster, resume from last successful model
grid = h2o.load_grid(
    "hdfs://nameNode/user/john/gbm_grid_recovery/gbm_grid",
    load_params_references=True
)
train = h2o.get_frame("iris")
grid.hyper_params = hyper_parameters
grid.train(x=list(range(4)), y=4, training_frame=train)

Get Started

Core Concepts

Algorithms

Model Workflows

Deployment

Grid Search & Hyperparameter Tuning

Cartesian vs random search

Cartesian grid search example

Random grid search example

Search criteria

Strategy options

Stopping criteria examples

Getting the best model

Parallelism

Saving and loading a grid

Auto-checkpointing

Manual save and load

Fault-tolerant grid search

Build docs developers (and LLMs) love

Get Started

Core Concepts

Algorithms

Model Workflows

Deployment

​Cartesian vs random search

​Cartesian grid search example

​Random grid search example

​Search criteria

​Strategy options

​Stopping criteria examples

​Getting the best model

​Parallelism

​Saving and loading a grid

​Auto-checkpointing

​Manual save and load

​Fault-tolerant grid search

Build docs developers (and LLMs) love

Cartesian vs random search

Cartesian grid search example

Random grid search example

Search criteria

Strategy options

Stopping criteria examples

Getting the best model

Parallelism

Saving and loading a grid

Auto-checkpointing

Manual save and load

Fault-tolerant grid search