Skip to main content
H2O-3 provides several functions for persisting trained models. The right approach depends on whether you need the model for continued H2O use (binary model) or for production scoring without an H2O cluster (MOJO).

Binary models

A binary model saves the full H2O model object, including all internal state. Binary models are version-specific: a model saved with H2O version X can only be loaded with the same version X.
Binary models are not compatible across H2O versions. If you upgrade H2O, you must retrain and re-save your models. For production deployment, use MOJO or POJO format instead.

Saving and loading locally

import h2o
from h2o.estimators import H2ODeepLearningEstimator
h2o.init()

# Train a model
model = H2ODeepLearningEstimator()
model.train(params)

# Save to /tmp/mymodel/
model_path = h2o.save_model(model=model, path="/tmp/mymodel", force=True)
print(model_path)
# /tmp/mymodel/DeepLearning_model_python_1441838096933

# Load the saved model
saved_model = h2o.load_model(model_path)

h2o.save_model() parameters

ParameterDescription
modelThe H2O model object to save.
pathDirectory path to save to. Supports local paths, hdfs://, s3://, and gs://. Defaults to the current working directory.
forceIf True, overwrite existing files at the destination.
export_cross_validation_predictionsIf True, include CV holdout predictions in the saved artifact.
filenameCustom filename for the saved model. Defaults to model.model_id.

Downloading and uploading models

Use download_model() and upload_model() when the H2O cluster is remote and you need to transfer model files to or from the local machine running your Python/R session.
# Download model from the H2O cluster to your local machine
my_local_model = h2o.download_model(model, path="/Users/UserName/Desktop")

# Upload a previously downloaded model back to the H2O cluster
uploaded_model = h2o.upload_model(my_local_model)
The owner of a saved file (via save_model) is the user running the H2O cluster process. The owner of a downloaded file (via download_model) is the user running the Python/R session.

Saving to cloud storage

Prefix the path with the appropriate URI scheme to save directly to distributed storage.
hdfs_name_node = "node-1"
hdfs_model_path = "hdfs://" + hdfs_name_node + "/tmp/models"
new_model_path = h2o.save_model(h2o_glm, hdfs_model_path)

MOJO models

A MOJO (Model Object, Optimized) is a portable, self-contained model archive. Unlike binary models, MOJOs:
  • Do not require an H2O cluster to score
  • Are not tied to a specific H2O version
  • Can be deployed in Java environments via the h2o-genmodel library
Use MOJOs for production deployment. They are more compact and faster than POJOs, and support the widest range of algorithms.

Supported algorithms

The following algorithms support MOJO export and/or import:
AlgorithmExportableImportable
GBMYesYes
DRFYesYes
GLMYesYes
XGBoostYesYes
Deep LearningYesYes
Stacked EnsembleYesYes
AutoMLYesYes
GAMYesYes
CoxPHYesYes
RuleFitYesYes
Uplift DRFYesYes
Isolation ForestYesYes
Extended Isolation ForestYesYes
GLRMYesNo
PCAYesNo
K-MeansYesNo
Naïve BayesNoNo
SVMNoNo
AutoML will always produce a model with a MOJO, though the exact model type depends on the run. In most cases you will get a Stacked Ensemble. All individual models within an AutoML run are importable, but only individual (non-AutoML) models are exportable as MOJOs.

Saving and importing MOJOs

import h2o
from h2o.estimators.glm import H2OGeneralizedLinearEstimator
h2o.init()

data = h2o.import_file(path='training_dataset.csv')
original_model = H2OGeneralizedLinearEstimator()
original_model.train(
    x=["Some column", "Another column"],
    y="response",
    training_frame=data
)

# Save as MOJO
path = '/path/to/model/directory/model.zip'
original_model.save_mojo(path)

# Import the MOJO back into H2O for scoring
imported_model = h2o.import_mojo(path)
new_observations = h2o.import_file(path='new_observations.csv')
predictions = imported_model.predict(new_observations)

Downloading and uploading MOJOs

Use download_mojo() and upload_mojo() when the H2O cluster is remote.
import h2o
from h2o.estimators import H2OGradientBoostingEstimator
h2o.init()

df = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv")
model = H2OGradientBoostingEstimator()
model.train(x=list(range(4)), y="class", training_frame=df)

# Download MOJO to local machine
my_mojo = model.download_mojo(path="/Users/UserName/Desktop")

# Upload the MOJO to the H2O cluster
mojo_model = h2o.upload_mojo(my_mojo)

Model checkpointing

Several H2O-3 algorithms (GBM, DRF, Deep Learning, XGBoost) support checkpointing: resuming training from a previously saved model. Pass the checkpoint parameter with the model ID of an existing model.
from h2o.estimators.gbm import H2OGradientBoostingEstimator

# Initial training run (50 trees)
gbm_v1 = H2OGradientBoostingEstimator(ntrees=50, seed=42)
gbm_v1.train(x=predictors, y=response, training_frame=train)

# Save the model
model_path = h2o.save_model(gbm_v1, path="/tmp/checkpoints", force=True)

# Resume training from the checkpoint (add 50 more trees)
gbm_v2 = H2OGradientBoostingEstimator(
    ntrees=100,           # total trees including the checkpoint
    checkpoint=gbm_v1.model_id,
    seed=42
)
gbm_v2.train(x=predictors, y=response, training_frame=train)
Checkpointing is useful for incrementally training large models without restarting from scratch, or for fine-tuning a base model on new data.

Advanced: lazy MOJO import with Generic model

The Generic model provides fine-grained control over MOJO loading. Upload the MOJO bytes once and instantiate multiple scored models from the same upload without re-uploading.
import h2o
h2o.init()

# Download MOJO from original model
path = '/path/to/model/directory/model.zip'
original_model.download_mojo(path)

# Upload MOJO bytes once (lazy import)
imported_mojo_key = h2o.lazy_import(path)

# Build the generic model from already-uploaded bytes
from h2o.estimators import H2OGenericEstimator
generic_model = H2OGenericEstimator(
    model_key=h2o.get_frame(imported_mojo_key[0])
)
new_observations = h2o.import_file(path='new_observations.csv')
predictions = generic_model.predict(new_observations)

Build docs developers (and LLMs) love