Skip to main content
H2O-3 provides a broad set of distributed machine learning algorithms covering supervised learning (regression and classification), unsupervised learning (clustering and dimensionality reduction), and meta-learning (ensembles). All algorithms run on H2O’s distributed in-memory computing engine and export to MOJO/POJO for production scoring.

Supported Algorithms

AlgorithmClass (Python)TaskMOJO
Gradient Boosting MachineH2OGradientBoostingEstimatorRegression, ClassificationYes
XGBoostH2OXGBoostEstimatorRegression, ClassificationYes
Distributed Random ForestH2ORandomForestEstimatorRegression, ClassificationYes
Deep LearningH2ODeepLearningEstimatorRegression, ClassificationYes
Generalized Linear ModelH2OGeneralizedLinearEstimatorRegression, ClassificationYes
Generalized Additive ModelH2OGeneralizedAdditiveEstimatorRegression, ClassificationYes
Stacked EnsemblesH2OStackedEnsembleEstimatorRegression, ClassificationYes
K-MeansH2OKMeansEstimatorClusteringExport only
PCAH2OPrincipalComponentAnalysisEstimatorDimensionality ReductionExport only
Naive BayesH2ONaiveBayesEstimatorClassificationYes
Isolation ForestH2OIsolationForestEstimatorAnomaly DetectionYes
AutoMLH2OAutoMLAll supervised

Choosing the Right Algorithm

By Problem Type

For predicting a continuous numeric value:
  • GBM / XGBoost — best general-purpose accuracy; XGBoost can be faster on large tabular data with GPU support.
  • GLM — best when you need a linear, interpretable model or a regularized baseline (elastic net with Gaussian family).
  • Deep Learning — useful for very large datasets or complex feature interactions, but requires more tuning.
  • DRF — strong out-of-the-box baseline; naturally handles missing values and mixed types.
  • AutoML — try all of the above automatically and rank by RMSE/deviance.

By Data Characteristics

CharacteristicRecommended Algorithm(s)
Small dataset (< 10k rows)GLM, DRF, GBM
Large dataset (> 1M rows)XGBoost (GPU), GBM, DRF
Many categorical featuresGBM (native enum encoding), DRF
Sparse / high-dimensional dataGLM (lasso), Deep Learning
Time-series / sequentialGBM with monotone constraints
Need fast inference (MOJO)GBM, XGBoost, DRF, GLM
Need model explainabilityGLM, GBM (SHAP contributions), DRF (variable importance)
Missing valuesGBM, DRF (handle natively)

By Interpretability Need

H2O-3 supports SHAP (Shapley Additive Explanations) contributions for tree-based models (GBM, XGBoost, DRF) and variable importance for all supervised models. Use model.explain() on any trained model or AutoML object for an automatic explainability report.
Interpretability LevelAlgorithms
Fully interpretableGLM, GAM
Variable importanceGBM, XGBoost, DRF, Deep Learning
SHAP contributionsGBM, XGBoost, DRF
Black-boxDeep Learning, Stacked Ensembles

Algorithm Pages

AutoML

Automatically train and rank multiple models with a single function call.

GBM & XGBoost

Gradient boosted trees — H2O’s native GBM and the XGBoost backend.

Distributed Random Forest

Bagged ensembles of decision trees with column and row subsampling.

Deep Learning

Multi-layer feedforward neural networks with adaptive learning rates.

GLM & GAM

Regularized linear models (elastic net) and generalized additive models.

Stacked Ensembles

Super-learner ensembles that combine cross-validated base model predictions.

Clustering & Dimensionality Reduction

K-Means clustering and PCA for unsupervised learning.

Build docs developers (and LLMs) love