Supported Algorithms
| Algorithm | Class (Python) | Task | MOJO |
|---|---|---|---|
| Gradient Boosting Machine | H2OGradientBoostingEstimator | Regression, Classification | Yes |
| XGBoost | H2OXGBoostEstimator | Regression, Classification | Yes |
| Distributed Random Forest | H2ORandomForestEstimator | Regression, Classification | Yes |
| Deep Learning | H2ODeepLearningEstimator | Regression, Classification | Yes |
| Generalized Linear Model | H2OGeneralizedLinearEstimator | Regression, Classification | Yes |
| Generalized Additive Model | H2OGeneralizedAdditiveEstimator | Regression, Classification | Yes |
| Stacked Ensembles | H2OStackedEnsembleEstimator | Regression, Classification | Yes |
| K-Means | H2OKMeansEstimator | Clustering | Export only |
| PCA | H2OPrincipalComponentAnalysisEstimator | Dimensionality Reduction | Export only |
| Naive Bayes | H2ONaiveBayesEstimator | Classification | Yes |
| Isolation Forest | H2OIsolationForestEstimator | Anomaly Detection | Yes |
| AutoML | H2OAutoML | All supervised | — |
Choosing the Right Algorithm
By Problem Type
- Regression
- Binary Classification
- Multiclass Classification
- Clustering
For predicting a continuous numeric value:
- GBM / XGBoost — best general-purpose accuracy; XGBoost can be faster on large tabular data with GPU support.
- GLM — best when you need a linear, interpretable model or a regularized baseline (elastic net with Gaussian family).
- Deep Learning — useful for very large datasets or complex feature interactions, but requires more tuning.
- DRF — strong out-of-the-box baseline; naturally handles missing values and mixed types.
- AutoML — try all of the above automatically and rank by RMSE/deviance.
By Data Characteristics
| Characteristic | Recommended Algorithm(s) |
|---|---|
| Small dataset (< 10k rows) | GLM, DRF, GBM |
| Large dataset (> 1M rows) | XGBoost (GPU), GBM, DRF |
| Many categorical features | GBM (native enum encoding), DRF |
| Sparse / high-dimensional data | GLM (lasso), Deep Learning |
| Time-series / sequential | GBM with monotone constraints |
| Need fast inference (MOJO) | GBM, XGBoost, DRF, GLM |
| Need model explainability | GLM, GBM (SHAP contributions), DRF (variable importance) |
| Missing values | GBM, DRF (handle natively) |
By Interpretability Need
H2O-3 supports SHAP (Shapley Additive Explanations) contributions for tree-based models (GBM, XGBoost, DRF) and variable importance for all supervised models. Use
model.explain() on any trained model or AutoML object for an automatic explainability report.| Interpretability Level | Algorithms |
|---|---|
| Fully interpretable | GLM, GAM |
| Variable importance | GBM, XGBoost, DRF, Deep Learning |
| SHAP contributions | GBM, XGBoost, DRF |
| Black-box | Deep Learning, Stacked Ensembles |
Algorithm Pages
AutoML
Automatically train and rank multiple models with a single function call.
GBM & XGBoost
Gradient boosted trees — H2O’s native GBM and the XGBoost backend.
Distributed Random Forest
Bagged ensembles of decision trees with column and row subsampling.
Deep Learning
Multi-layer feedforward neural networks with adaptive learning rates.
GLM & GAM
Regularized linear models (elastic net) and generalized additive models.
Stacked Ensembles
Super-learner ensembles that combine cross-validated base model predictions.
Clustering & Dimensionality Reduction
K-Means clustering and PCA for unsupervised learning.