Introduction

H2O-3 is an open source, in-memory, distributed, fast, and scalable machine learning and predictive analytics platform. It lets you build machine learning models on big data and provides easy productionalization of those models in an enterprise environment.

H2O-3 is licensed under the Apache License, Version 2.0. Source code, issue tracking, and community discussion are available on GitHub.

What is H2O-3?

H2O-3 is an in-memory platform for distributed, scalable machine learning. Its core code is written in Java. A distributed key-value store is used to access and reference data, models, and objects across all nodes and machines. Algorithms are implemented on top of H2O-3’s distributed map-reduce framework and use the Java fork/join framework for multi-threading. Data is read in parallel, distributed across the cluster, stored in-memory in a columnar compressed format. H2O’s data parser has built-in intelligence to guess the schema of incoming datasets and supports data ingest from multiple sources in various formats.

Supported algorithms

H2O-3 includes production-ready implementations of the following algorithms:

AdaBoost — boosting ensemble for classification
AutoML — fully automatic model training and selection
Cox Proportional Hazards (CoxPH) — survival analysis
Decision Tree — single decision tree learner
Deep Learning — multi-layer neural networks
Distributed Random Forest (DRF) — tree-based ensemble
Distributed Uplift Random Forest — treatment effect estimation
Extended Isolation Forest — anomaly detection
Generalized Additive Models (GAM) — flexible semi-parametric models
Generalized Linear Model (GLM) — linear, logistic, and Poisson regression
Generalized Low Rank Models (GLRM) — matrix factorization
Gradient Boosting Machine (GBM) — tree boosting for regression and classification
Isolation Forest — anomaly detection via random partitioning
Isotonic Regression — monotone regression
K-Means Clustering — unsupervised partitioning
Naïve Bayes Classifier — probabilistic classification
Principal Component Analysis (PCA) — dimensionality reduction
RuleFit — interpretable rule-based model
Stacked Ensembles — meta-learner combining base models
Support Vector Machine (PSVM) — kernel-based classification
Target Encoding — categorical feature preprocessing
Word2Vec — word embedding from text
XGBoost — optimized gradient boosting

Multi-language support

H2O-3 exposes a consistent API across multiple languages and interfaces. All client libraries communicate with the H2O-3 backend through the REST API.

Python

Install via pip install h2o. Full-featured client with estimators, frames, and AutoML.

R

Install via install.packages("h2o"). Mirrors the Python API with idiomatic R conventions.

Flow UI

Browser-based notebook interface available at http://localhost:54321 when the cluster is running.

REST API

JSON over HTTP. All capabilities of H2O-3 are accessible from any language or tool.

Java and Scala users can access H2O-3 through the REST API or by embedding H2O-3 as a Maven artifact in their projects.

Architecture overview

Cluster model

An H2O-3 cluster is a set of JVM processes (nodes) that work together as a single distributed system. Nodes communicate peer-to-peer — there is no designated master node for data distribution.

Cluster formation: New H2O-3 nodes join during launch using multicast or flatfile-based discovery. Once a job starts, the cluster locks and prevents new members from joining.
In-memory storage: Data is stored across all nodes in a columnar compressed format. Each column (Vec) is split into contiguous subsets (Chunk) distributed across the cluster.
Distributed computation: MRTask (Map/Reduce) moves computation to the data rather than moving data to the computation. Results reduce up a tree back to the initiating node.

Distributed key-value store (DKV)

Every object — frames, models, chunks — has a home node determined by consistent hashing of its key. The DKV is used to locate and access all distributed objects:

DKV.put(key, value)   // store an object
DKV.get(key)          // retrieve an object from its home node

REST API

H2O-3’s REST API allows access to all capabilities from an external program or script through JSON over HTTP. The REST API is used by:

The Flow web UI
The R binding (H2O-R)
The Python binding (H2O-Python)
Any custom integration

The default port for the REST API is 54321. The internal communication port is 54322.

Key sections

Quickstart

Train your first model in Python or R in under 5 minutes.

Installation

Install H2O-3 via pip, conda, CRAN, or download the standalone jar.

Algorithm reference

Detailed documentation for every supported algorithm.

AutoML

Automatically train and rank hundreds of models with a single call.

Get Started

Core Concepts

Algorithms

Model Workflows

Deployment

Introduction

What is H2O-3?

Supported algorithms

Multi-language support

Python

R

Flow UI

REST API

Architecture overview

Cluster model

Distributed key-value store (DKV)

REST API

Key sections

Quickstart

Installation

Algorithm reference

AutoML

Build docs developers (and LLMs) love

Get Started

Core Concepts

Algorithms

Model Workflows

Deployment

​What is H2O-3?

​Supported algorithms

​Multi-language support

Python

R

Flow UI

REST API

​Architecture overview

​Cluster model

​Distributed key-value store (DKV)

​REST API

​Key sections

Quickstart

Installation

Algorithm reference

AutoML

Build docs developers (and LLMs) love

What is H2O-3?

Supported algorithms

Multi-language support

Architecture overview

Cluster model

Distributed key-value store (DKV)

REST API

Key sections