H2O-3 is licensed under the Apache License, Version 2.0. Source code, issue tracking, and community discussion are available on GitHub.
What is H2O-3?
H2O-3 is an in-memory platform for distributed, scalable machine learning. Its core code is written in Java. A distributed key-value store is used to access and reference data, models, and objects across all nodes and machines. Algorithms are implemented on top of H2O-3’s distributed map-reduce framework and use the Java fork/join framework for multi-threading. Data is read in parallel, distributed across the cluster, stored in-memory in a columnar compressed format. H2O’s data parser has built-in intelligence to guess the schema of incoming datasets and supports data ingest from multiple sources in various formats.Supported algorithms
H2O-3 includes production-ready implementations of the following algorithms:- AdaBoost — boosting ensemble for classification
- AutoML — fully automatic model training and selection
- Cox Proportional Hazards (CoxPH) — survival analysis
- Decision Tree — single decision tree learner
- Deep Learning — multi-layer neural networks
- Distributed Random Forest (DRF) — tree-based ensemble
- Distributed Uplift Random Forest — treatment effect estimation
- Extended Isolation Forest — anomaly detection
- Generalized Additive Models (GAM) — flexible semi-parametric models
- Generalized Linear Model (GLM) — linear, logistic, and Poisson regression
- Generalized Low Rank Models (GLRM) — matrix factorization
- Gradient Boosting Machine (GBM) — tree boosting for regression and classification
- Isolation Forest — anomaly detection via random partitioning
- Isotonic Regression — monotone regression
- K-Means Clustering — unsupervised partitioning
- Naïve Bayes Classifier — probabilistic classification
- Principal Component Analysis (PCA) — dimensionality reduction
- RuleFit — interpretable rule-based model
- Stacked Ensembles — meta-learner combining base models
- Support Vector Machine (PSVM) — kernel-based classification
- Target Encoding — categorical feature preprocessing
- Word2Vec — word embedding from text
- XGBoost — optimized gradient boosting
Multi-language support
H2O-3 exposes a consistent API across multiple languages and interfaces. All client libraries communicate with the H2O-3 backend through the REST API.Python
Install via
pip install h2o. Full-featured client with estimators, frames, and AutoML.R
Install via
install.packages("h2o"). Mirrors the Python API with idiomatic R conventions.Flow UI
Browser-based notebook interface available at
http://localhost:54321 when the cluster is running.REST API
JSON over HTTP. All capabilities of H2O-3 are accessible from any language or tool.
Java and Scala users can access H2O-3 through the REST API or by embedding H2O-3 as a Maven artifact in their projects.
Architecture overview
Cluster model
An H2O-3 cluster is a set of JVM processes (nodes) that work together as a single distributed system. Nodes communicate peer-to-peer — there is no designated master node for data distribution.- Cluster formation: New H2O-3 nodes join during launch using multicast or flatfile-based discovery. Once a job starts, the cluster locks and prevents new members from joining.
- In-memory storage: Data is stored across all nodes in a columnar compressed format. Each column (
Vec) is split into contiguous subsets (Chunk) distributed across the cluster. - Distributed computation: MRTask (Map/Reduce) moves computation to the data rather than moving data to the computation. Results reduce up a tree back to the initiating node.
Distributed key-value store (DKV)
Every object — frames, models, chunks — has a home node determined by consistent hashing of its key. The DKV is used to locate and access all distributed objects:REST API
H2O-3’s REST API allows access to all capabilities from an external program or script through JSON over HTTP. The REST API is used by:- The Flow web UI
- The R binding (
H2O-R) - The Python binding (
H2O-Python) - Any custom integration
Key sections
Quickstart
Train your first model in Python or R in under 5 minutes.
Installation
Install H2O-3 via pip, conda, CRAN, or download the standalone jar.
Algorithm reference
Detailed documentation for every supported algorithm.
AutoML
Automatically train and rank hundreds of models with a single call.