h2o Module

The h2o module is the top-level entry point for the H2O Python client. Import it with:

import h2o

Connection

h2o.init

h2o.init(
    url=None, ip=None, port=None, name=None,
    https=None, cacert=None, insecure=None,
    username=None, password=None, cookies=None,
    proxy=None, start_h2o=True, nthreads=-1,
    ice_root=None, log_dir=None, log_level=None,
    max_log_file_size=None, enable_assertions=True,
    max_mem_size=None, min_mem_size=None,
    strict_version_check=None, ignore_config=False,
    extra_classpath=None, jvm_custom_args=None,
    bind_to_localhost=True, verbose=True
)

Attempt to connect to a local H2O server, or start a new one if no existing server is found.

url

string

Full URL of the server to connect to. Can be used instead of ip + port + https.

string

The IP address or hostname of the server where H2O is running.

port

integer | string

Port number that H2O is listening to.

name

string

Cluster name. If None, the cluster name is not checked when connecting to an existing cluster. When starting a local cluster, a random name is generated if None.

https

boolean

Set to True to connect via HTTPS instead of HTTP.

insecure

boolean

When using HTTPS, set to True to disable SSL certificate verification.

username

string

Username for basic authentication.

password

string

Password for basic authentication.

proxy

dict

Proxy server address as a {scheme: address} dictionary.

start_h2o

boolean

default:"True"

If False, do not attempt to start a local H2O server when connection fails.

nthreads

integer

default:"-1"

Number of threads for the new H2O server. -1 uses all available cores.

log_dir

string

Directory for H2O server logs when a new instance is started.

log_level

string

Logger level for H2O. One of "TRACE", "DEBUG", "INFO", "WARN", "ERRR", "FATA". Defaults to "INFO".

max_mem_size

integer | string

Maximum memory for the new H2O server. Integer input is treated as gigabytes. String values support suffixes: "160M", "4G".

min_mem_size

integer | string

Minimum memory for the new H2O server. Uses the same format as max_mem_size.

strict_version_check

boolean

If True, raise an error when client and server versions do not match.

verbose

boolean

default:"True"

Set to False to suppress connection status messages.

import h2o
h2o.init(ip="localhost", port=54321)

h2o.connect

h2o.connect(
    server=None, url=None, ip=None, port=None,
    https=None, verify_ssl_certificates=None, cacert=None,
    auth=None, proxy=None, cookies=None,
    verbose=True, config=None, strict_version_check=False
)

Connect to an existing H2O server. Unlike h2o.init(), this function does not attempt to start a new server.

url

string

Full URL of the server to connect to.

string

IP address or hostname of the H2O server.

port

integer

Port that H2O is listening on.

https

boolean

Connect via HTTPS when True.

auth

tuple

A (username, password) tuple for basic authentication, or a requests-compatible auth object.

strict_version_check

boolean

default:"False"

Raise an error if the client and server versions do not match.

conn = h2o.connect(url="http://127.0.0.1:54321")

Data import

h2o.import_file

h2o.import_file(
    path=None, destination_frame=None, parse=True,
    header=0, sep=None, col_names=None, col_types=None,
    na_strings=None, pattern=None, skipped_columns=None,
    force_col_types=False, custom_non_data_line_markers=None,
    partition_by=None, quotechar=None, escapechar=None,
    tz_adjust_to_local=False
)

Import one or more files into the H2O cluster using a distributed, multi-threaded pull. The path must be accessible from every node in the cluster.

path

string | list

required

Path to the file or directory to import. Accepts a URL, local path, or S3/HDFS URI. A list of paths is also accepted.

destination_frame

string

Key to assign to the imported frame. Auto-generated if not provided.

parse

boolean

default:"True"

If False, returns a list of raw frame paths without parsing.

header

integer

default:"0"

-1 means first line is data, 0 means guess, 1 means first line is header.

sep

string

Field separator character. Auto-detected if not provided.

col_names

string[]

Column names for the imported frame.

col_types

string[] | object

Column types as a list or {column_name: type} dict. Valid types: "unknown", "uuid", "string", "numeric", "enum", "time".

na_strings

string[] | string[][]

Values to interpret as missing. Accepts a flat list, a list-of-lists per column, or a {column: list} dict.

birds = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/birds.csv")

h2o.upload_file

h2o.upload_file(
    path, destination_frame=None, header=0, sep=None,
    col_names=None, col_types=None, na_strings=None,
    skipped_columns=None, force_col_types=False,
    quotechar=None, escapechar=None
)

Upload a local file to the H2O cluster using a single-threaded push. Use import_file for large files where parallel ingestion is preferred.

path

string

required

Local path to the file to upload.

destination_frame

string

Key to assign to the resulting frame.

header

integer

default:"0"

Header detection: -1 (data), 0 (guess), 1 (header).

sep

string

Field separator character.

iris = h2o.upload_file("~/data/iris.csv")

Model persistence

h2o.save_model

h2o.save_model(model, path="", force=False, export_cross_validation_predictions=False, filename=None)

Save an H2O model object to disk on the server’s file system. The saved file can be loaded back with h2o.load_model().

model

H2OModel

required

The trained model object to save.

path

string

default:""

Path to the directory where the model will be saved (local, HDFS, or S3). Defaults to the current working directory.

force

boolean

default:"False"

If True, overwrite an existing file at the destination.

export_cross_validation_predictions

boolean

default:"False"

Include CV holdout frame predictions in the saved artifact.

filename

string

Custom filename for the saved model. Defaults to the model ID.

model_path = h2o.save_model(model, path="/tmp/models", force=True)

h2o.load_model

h2o.load_model(path)

Load a saved H2O model from disk back into the cluster.

path

string

required

The full path to the saved model file on the server’s file system.

model = h2o.load_model("/tmp/models/GBM_model_python_1234567890")

h2o.download_model

h2o.download_model(model, path="", export_cross_validation_predictions=False, filename=None)

Download a model binary from the H2O cluster to the local machine running this Python session.

model

H2OModel

required

The trained model object to download.

path

string

default:""

Local directory to save the model. Defaults to the current working directory.

filename

string

Custom filename. Defaults to the model ID.

h2o.upload_model

h2o.upload_model(path)

Upload a binary model from the local machine to the H2O cluster.

path

string

required

Local path to the previously downloaded binary model file.

model = h2o.upload_model("/tmp/GBM_model_python_1234567890")

Cluster management

h2o.cluster_info

h2o.cluster_info()

Deprecated. Use h2o.cluster().show_status() instead.

Display basic information about the currently connected H2O cluster.

h2o.shutdown

h2o.shutdown(prompt=False)

Deprecated. Use h2o.cluster().shutdown() instead.

Shut down the H2O cluster.

prompt

boolean

default:"False"

If True, prompt for confirmation before shutting down.

Object management

h2o.ls

h2o.ls()

List all keys (frames and models) stored in the H2O cluster. Returns a Pandas DataFrame.

h2o.ls()

h2o.remove

h2o.remove(x, cascade=True)

Remove one or more objects from the H2O cluster.

H2OFrame | H2OEstimator | string | list

required

The object(s) to remove. Accepts a frame, model, key string, or list of any combination.

cascade

boolean

default:"True"

When True, also remove dependent objects (e.g. submodels).

h2o.remove(frame)
h2o.remove([frame, model])

h2o.remove_all

h2o.remove_all(retained=None)

Remove all objects from the H2O cluster, with an option to retain specific keys.

retained

string[]

Keys of models or frames to keep. Training and validation frames of retained models are also kept.

h2o.get_model

h2o.get_model(model_id)

Retrieve a model that already exists in the H2O cluster.

model_id

string

required

The model ID string.

model = h2o.get_model("GBM_model_python_1234567890")

h2o.get_frame

h2o.get_frame(frame_id)

Retrieve a handle to an existing frame in the H2O cluster.

frame_id

string

required

The frame ID string.

frame = h2o.get_frame("iris.hex")

Grid search

h2o.grid

Grid search is performed via H2OGridSearch. See the example below for the typical usage pattern.

from h2o.grid.grid_search import H2OGridSearch
from h2o.estimators.gbm import H2OGradientBoostingEstimator

hyper_params = {
    "ntrees": [50, 100],
    "max_depth": [3, 5, 7],
    "learn_rate": [0.01, 0.1],
}

grid = H2OGridSearch(
    model=H2OGradientBoostingEstimator,
    hyper_params=hyper_params,
    search_criteria={"strategy": "RandomDiscrete", "max_models": 10, "seed": 42},
)
grid.train(x=x, y=y, training_frame=train)

model

H2OEstimator class

required

The estimator class (not an instance) to tune.

hyper_params

object

required

Dictionary mapping parameter names to lists of values to search over.

search_criteria

object

Dictionary controlling the search strategy. Keys include strategy ("Cartesian" or "RandomDiscrete"), max_models, max_runtime_secs, and seed.

Python API

R API

REST API

Connection

h2o.init

h2o.connect

Data import

h2o.import_file

h2o.upload_file

Model persistence

h2o.save_model

h2o.load_model

h2o.download_model

h2o.upload_model

Cluster management

h2o.cluster_info

h2o.shutdown

Object management

h2o.ls

h2o.remove

h2o.remove_all

h2o.get_model

h2o.get_frame

Grid search

h2o.grid

Build docs developers (and LLMs) love

Python API

R API

REST API

​Connection

​h2o.init

​h2o.connect

​Data import

​h2o.import_file

​h2o.upload_file

​Model persistence

​h2o.save_model

​h2o.load_model

​h2o.download_model

​h2o.upload_model

​Cluster management

​h2o.cluster_info

​h2o.shutdown

​Object management

​h2o.ls

​h2o.remove

​h2o.remove_all

​h2o.get_model

​h2o.get_frame

​Grid search

​h2o.grid

Build docs developers (and LLMs) love

Connection

h2o.init

h2o.connect

Data import

h2o.import_file

h2o.upload_file

Model persistence

h2o.save_model

h2o.load_model

h2o.download_model

h2o.upload_model

Cluster management

h2o.cluster_info

h2o.shutdown

Object management

h2o.ls

h2o.remove

h2o.remove_all

h2o.get_model

h2o.get_frame

Grid search

h2o.grid