h2o module is the top-level entry point for the H2O Python client. Import it with:
Connection
h2o.init
Full URL of the server to connect to. Can be used instead of
ip + port + https.The IP address or hostname of the server where H2O is running.
Port number that H2O is listening to.
Cluster name. If
None, the cluster name is not checked when connecting to an existing cluster. When starting a local cluster, a random name is generated if None.Set to
True to connect via HTTPS instead of HTTP.When using HTTPS, set to
True to disable SSL certificate verification.Username for basic authentication.
Password for basic authentication.
Proxy server address as a
{scheme: address} dictionary.If
False, do not attempt to start a local H2O server when connection fails.Number of threads for the new H2O server.
-1 uses all available cores.Directory for H2O server logs when a new instance is started.
Logger level for H2O. One of
"TRACE", "DEBUG", "INFO", "WARN", "ERRR", "FATA". Defaults to "INFO".Maximum memory for the new H2O server. Integer input is treated as gigabytes. String values support suffixes:
"160M", "4G".Minimum memory for the new H2O server. Uses the same format as
max_mem_size.If
True, raise an error when client and server versions do not match.Set to
False to suppress connection status messages.h2o.connect
h2o.init(), this function does not attempt to start a new server.
Full URL of the server to connect to.
IP address or hostname of the H2O server.
Port that H2O is listening on.
Connect via HTTPS when
True.A
(username, password) tuple for basic authentication, or a requests-compatible auth object.Raise an error if the client and server versions do not match.
Data import
h2o.import_file
Path to the file or directory to import. Accepts a URL, local path, or S3/HDFS URI. A list of paths is also accepted.
Key to assign to the imported frame. Auto-generated if not provided.
If
False, returns a list of raw frame paths without parsing.-1 means first line is data, 0 means guess, 1 means first line is header.Field separator character. Auto-detected if not provided.
Column names for the imported frame.
Column types as a list or
{column_name: type} dict. Valid types: "unknown", "uuid", "string", "numeric", "enum", "time".Values to interpret as missing. Accepts a flat list, a list-of-lists per column, or a
{column: list} dict.h2o.upload_file
import_file for large files where parallel ingestion is preferred.
Local path to the file to upload.
Key to assign to the resulting frame.
Header detection:
-1 (data), 0 (guess), 1 (header).Field separator character.
Model persistence
h2o.save_model
h2o.load_model().
The trained model object to save.
Path to the directory where the model will be saved (local, HDFS, or S3). Defaults to the current working directory.
If
True, overwrite an existing file at the destination.Include CV holdout frame predictions in the saved artifact.
Custom filename for the saved model. Defaults to the model ID.
h2o.load_model
The full path to the saved model file on the server’s file system.
h2o.download_model
The trained model object to download.
Local directory to save the model. Defaults to the current working directory.
Custom filename. Defaults to the model ID.
h2o.upload_model
Local path to the previously downloaded binary model file.
Cluster management
h2o.cluster_info
h2o.shutdown
If
True, prompt for confirmation before shutting down.Object management
h2o.ls
h2o.remove
The object(s) to remove. Accepts a frame, model, key string, or list of any combination.
When
True, also remove dependent objects (e.g. submodels).h2o.remove_all
Keys of models or frames to keep. Training and validation frames of retained models are also kept.
h2o.get_model
The model ID string.
h2o.get_frame
The frame ID string.
Grid search
h2o.grid
Grid search is performed viaH2OGridSearch. See the example below for the typical usage pattern.
The estimator class (not an instance) to tune.
Dictionary mapping parameter names to lists of values to search over.
Dictionary controlling the search strategy. Keys include
strategy ("Cartesian" or "RandomDiscrete"), max_models, max_runtime_secs, and seed.