H2OFrame is the core data container in H2O. Data is stored on the H2O cluster (which may be remote), and the Python object is a lightweight handle. Operations on an H2OFrame are executed lazily and distributed across the cluster.
H2OFrame is also accessible directly as h2o.H2OFrame for convenience.Construction
From a Python object
The source object to convert. Accepted types:
None— creates an empty frame- A flat list or tuple — creates a single-column frame
- A
{name: list}dictionary — creates a multi-column frame - A list of lists — rows of a rectangular table
- A Pandas DataFrame or NumPy ndarray
- A SciPy sparse matrix
Key to assign the frame in H2O’s distributed key-value store. Auto-generated if not provided.
Header detection when
python_obj is a list of lists. -1 = first row is data, 1 = first row is header, 0 = guess.Explicit column names. Overrides any names derived from the source data.
Explicit column types. Valid values:
"unknown", "uuid", "string", "float", "real", "double", "int", "long", "numeric", "categorical", "factor", "enum", "time".Strings to interpret as missing values. Can be specified globally, per-column as a list-of-lists, or as a
{column: list} dict.From an imported file
Properties
nrows / nrow
ncols / ncol
shape
(nrows, ncols).
columns / names
frame.columns renames all columns.
types
"int", "real", "enum", "string", "time").
frame_id
Inspection
head / tail
rows rows as an H2OFrame.
describe
summary
as_data_frame
use_pandas=False).
Return a Pandas DataFrame when
True. Returns a list of lists otherwise.Indexing and slicing
H2OFrame supports NumPy-style indexing usingframe[row_selector, col_selector].
Column operations
cbind
frame2 to frame1. Both frames must have the same number of rows.
rbind
frame2 below frame1. Both frames must have the same column structure.
merge
Column name(s) in
frame1 to join on.Column name(s) in
frame2 to join on. Defaults to by_x.Perform a left outer join when
True.Statistical operations
mean
axis=1) mean.
var
sd
cor
Type casting
asfactor
asnumeric
ascharacter
Splitting frames
seed parameter controls reproducibility.
Fractions for each split. Must sum to less than 1.0.
Random seed for reproducibility.