Skip to main content
The h2o module is the top-level entry point for the H2O Python client. Import it with:
import h2o

Connection

h2o.init

h2o.init(
    url=None, ip=None, port=None, name=None,
    https=None, cacert=None, insecure=None,
    username=None, password=None, cookies=None,
    proxy=None, start_h2o=True, nthreads=-1,
    ice_root=None, log_dir=None, log_level=None,
    max_log_file_size=None, enable_assertions=True,
    max_mem_size=None, min_mem_size=None,
    strict_version_check=None, ignore_config=False,
    extra_classpath=None, jvm_custom_args=None,
    bind_to_localhost=True, verbose=True
)
Attempt to connect to a local H2O server, or start a new one if no existing server is found.
url
string
Full URL of the server to connect to. Can be used instead of ip + port + https.
ip
string
The IP address or hostname of the server where H2O is running.
port
integer | string
Port number that H2O is listening to.
name
string
Cluster name. If None, the cluster name is not checked when connecting to an existing cluster. When starting a local cluster, a random name is generated if None.
https
boolean
Set to True to connect via HTTPS instead of HTTP.
insecure
boolean
When using HTTPS, set to True to disable SSL certificate verification.
username
string
Username for basic authentication.
password
string
Password for basic authentication.
proxy
dict
Proxy server address as a {scheme: address} dictionary.
start_h2o
boolean
default:"True"
If False, do not attempt to start a local H2O server when connection fails.
nthreads
integer
default:"-1"
Number of threads for the new H2O server. -1 uses all available cores.
log_dir
string
Directory for H2O server logs when a new instance is started.
log_level
string
Logger level for H2O. One of "TRACE", "DEBUG", "INFO", "WARN", "ERRR", "FATA". Defaults to "INFO".
max_mem_size
integer | string
Maximum memory for the new H2O server. Integer input is treated as gigabytes. String values support suffixes: "160M", "4G".
min_mem_size
integer | string
Minimum memory for the new H2O server. Uses the same format as max_mem_size.
strict_version_check
boolean
If True, raise an error when client and server versions do not match.
verbose
boolean
default:"True"
Set to False to suppress connection status messages.
import h2o
h2o.init(ip="localhost", port=54321)

h2o.connect

h2o.connect(
    server=None, url=None, ip=None, port=None,
    https=None, verify_ssl_certificates=None, cacert=None,
    auth=None, proxy=None, cookies=None,
    verbose=True, config=None, strict_version_check=False
)
Connect to an existing H2O server. Unlike h2o.init(), this function does not attempt to start a new server.
url
string
Full URL of the server to connect to.
ip
string
IP address or hostname of the H2O server.
port
integer
Port that H2O is listening on.
https
boolean
Connect via HTTPS when True.
auth
tuple
A (username, password) tuple for basic authentication, or a requests-compatible auth object.
strict_version_check
boolean
default:"False"
Raise an error if the client and server versions do not match.
conn = h2o.connect(url="http://127.0.0.1:54321")

Data import

h2o.import_file

h2o.import_file(
    path=None, destination_frame=None, parse=True,
    header=0, sep=None, col_names=None, col_types=None,
    na_strings=None, pattern=None, skipped_columns=None,
    force_col_types=False, custom_non_data_line_markers=None,
    partition_by=None, quotechar=None, escapechar=None,
    tz_adjust_to_local=False
)
Import one or more files into the H2O cluster using a distributed, multi-threaded pull. The path must be accessible from every node in the cluster.
path
string | list
required
Path to the file or directory to import. Accepts a URL, local path, or S3/HDFS URI. A list of paths is also accepted.
destination_frame
string
Key to assign to the imported frame. Auto-generated if not provided.
parse
boolean
default:"True"
If False, returns a list of raw frame paths without parsing.
header
integer
default:"0"
-1 means first line is data, 0 means guess, 1 means first line is header.
sep
string
Field separator character. Auto-detected if not provided.
col_names
string[]
Column names for the imported frame.
col_types
string[] | object
Column types as a list or {column_name: type} dict. Valid types: "unknown", "uuid", "string", "numeric", "enum", "time".
na_strings
string[] | string[][]
Values to interpret as missing. Accepts a flat list, a list-of-lists per column, or a {column: list} dict.
birds = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/birds.csv")

h2o.upload_file

h2o.upload_file(
    path, destination_frame=None, header=0, sep=None,
    col_names=None, col_types=None, na_strings=None,
    skipped_columns=None, force_col_types=False,
    quotechar=None, escapechar=None
)
Upload a local file to the H2O cluster using a single-threaded push. Use import_file for large files where parallel ingestion is preferred.
path
string
required
Local path to the file to upload.
destination_frame
string
Key to assign to the resulting frame.
header
integer
default:"0"
Header detection: -1 (data), 0 (guess), 1 (header).
sep
string
Field separator character.
iris = h2o.upload_file("~/data/iris.csv")

Model persistence

h2o.save_model

h2o.save_model(model, path="", force=False, export_cross_validation_predictions=False, filename=None)
Save an H2O model object to disk on the server’s file system. The saved file can be loaded back with h2o.load_model().
model
H2OModel
required
The trained model object to save.
path
string
default:""
Path to the directory where the model will be saved (local, HDFS, or S3). Defaults to the current working directory.
force
boolean
default:"False"
If True, overwrite an existing file at the destination.
export_cross_validation_predictions
boolean
default:"False"
Include CV holdout frame predictions in the saved artifact.
filename
string
Custom filename for the saved model. Defaults to the model ID.
model_path = h2o.save_model(model, path="/tmp/models", force=True)

h2o.load_model

h2o.load_model(path)
Load a saved H2O model from disk back into the cluster.
path
string
required
The full path to the saved model file on the server’s file system.
model = h2o.load_model("/tmp/models/GBM_model_python_1234567890")

h2o.download_model

h2o.download_model(model, path="", export_cross_validation_predictions=False, filename=None)
Download a model binary from the H2O cluster to the local machine running this Python session.
model
H2OModel
required
The trained model object to download.
path
string
default:""
Local directory to save the model. Defaults to the current working directory.
filename
string
Custom filename. Defaults to the model ID.

h2o.upload_model

h2o.upload_model(path)
Upload a binary model from the local machine to the H2O cluster.
path
string
required
Local path to the previously downloaded binary model file.
model = h2o.upload_model("/tmp/GBM_model_python_1234567890")

Cluster management

h2o.cluster_info

h2o.cluster_info()
Deprecated. Use h2o.cluster().show_status() instead.
Display basic information about the currently connected H2O cluster.

h2o.shutdown

h2o.shutdown(prompt=False)
Deprecated. Use h2o.cluster().shutdown() instead.
Shut down the H2O cluster.
prompt
boolean
default:"False"
If True, prompt for confirmation before shutting down.

Object management

h2o.ls

h2o.ls()
List all keys (frames and models) stored in the H2O cluster. Returns a Pandas DataFrame.
h2o.ls()

h2o.remove

h2o.remove(x, cascade=True)
Remove one or more objects from the H2O cluster.
x
H2OFrame | H2OEstimator | string | list
required
The object(s) to remove. Accepts a frame, model, key string, or list of any combination.
cascade
boolean
default:"True"
When True, also remove dependent objects (e.g. submodels).
h2o.remove(frame)
h2o.remove([frame, model])

h2o.remove_all

h2o.remove_all(retained=None)
Remove all objects from the H2O cluster, with an option to retain specific keys.
retained
string[]
Keys of models or frames to keep. Training and validation frames of retained models are also kept.

h2o.get_model

h2o.get_model(model_id)
Retrieve a model that already exists in the H2O cluster.
model_id
string
required
The model ID string.
model = h2o.get_model("GBM_model_python_1234567890")

h2o.get_frame

h2o.get_frame(frame_id)
Retrieve a handle to an existing frame in the H2O cluster.
frame_id
string
required
The frame ID string.
frame = h2o.get_frame("iris.hex")

h2o.grid

Grid search is performed via H2OGridSearch. See the example below for the typical usage pattern.
from h2o.grid.grid_search import H2OGridSearch
from h2o.estimators.gbm import H2OGradientBoostingEstimator

hyper_params = {
    "ntrees": [50, 100],
    "max_depth": [3, 5, 7],
    "learn_rate": [0.01, 0.1],
}

grid = H2OGridSearch(
    model=H2OGradientBoostingEstimator,
    hyper_params=hyper_params,
    search_criteria={"strategy": "RandomDiscrete", "max_models": 10, "seed": 42},
)
grid.train(x=x, y=y, training_frame=train)
model
H2OEstimator class
required
The estimator class (not an instance) to tune.
hyper_params
object
required
Dictionary mapping parameter names to lists of values to search over.
search_criteria
object
Dictionary controlling the search strategy. Keys include strategy ("Cartesian" or "RandomDiscrete"), max_models, max_runtime_secs, and seed.

Build docs developers (and LLMs) love