Gradient Boosting Machine (GBM)
H2O’s GBM sequentially builds regression trees on all features of the dataset in a fully distributed way — each tree is built in parallel across the cluster. It supports per-row observation weights, offsets, N-fold cross-validation, and a wide range of distribution functions. MOJO Support: GBM fully supports importing and exporting MOJOs for low-latency production scoring.XGBoost
H2O’s XGBoost implementation is based on the native XGBoost library via JNI. It provides parallel tree boosting (GBDT/GBM) and is often faster than H2O GBM on large datasets, especially with GPU acceleration. It supports multicore via OpenMP and can use GPU backends (backend="gpu").
MOJO Support: XGBoost supports importing and exporting MOJOs.
Key Parameters
Shared Parameters (GBM & XGBoost)
Number of trees to build. More trees generally improve accuracy but increase training time and risk of overfitting. Use early stopping (
stopping_rounds) to find the optimal value automatically.Maximum depth of each tree. Higher values increase model complexity. For GBM,
5 is a good default; XGBoost often benefits from shallower trees (3–6) when combined with more rounds.Step size shrinkage applied after each tree. Smaller values (e.g.,
0.01–0.05) require more trees but often yield better generalization. Alias: eta in XGBoost.Row-wise subsampling rate per tree (without replacement). Values in the range
0.5–0.8 add stochasticity and can improve generalization (Friedman 1999 stochastic GBM). Alias: subsample in XGBoost.Column subsampling rate per tree level (without replacement). Alias:
colsample_bylevel in XGBoost.Column subsampling rate per tree (without replacement). Multiplicative with
col_sample_rate. Alias: colsample_bytree in XGBoost.Minimum number of observations required in a leaf node. Increase to prevent overfitting on small datasets. R parameter name:
node_size.Loss function / response distribution.
"auto" infers from the response column type. Options: "gaussian", "bernoulli", "multinomial", "poisson", "gamma", "tweedie", "laplace", "quantile", "huber".GBM-Specific Parameters
Reduce
learn_rate by this factor after every tree. E.g., learn_rate=0.05, learn_rate_annealing=0.99 gives a decaying learning rate that converges faster than a fixed small rate.How to bin continuous features for split-finding. Options:
"AUTO", "UniformAdaptive", "UniformRobust", "Random" (XRT-style), "QuantilesGlobal", "RoundRobin".XGBoost-Specific Parameters
Booster type:
"gbtree" (tree-based), "gblinear" (linear), or "dart" (Dropout Additive Regression Trees).Compute backend:
"auto" uses a GPU if available, otherwise CPU. Set "gpu" to force GPU, "cpu" to force CPU.L2 regularization on leaf weights. Larger values reduce overfitting.
L1 regularization on leaf weights. Promotes sparsity.
Tree construction algorithm:
"auto", "exact" (small/medium data), "approx", or "hist" (fast histogram method, required for GPU).Code Examples
GBM vs XGBoost: When to Use Which
| Criterion | H2O GBM | XGBoost |
|---|---|---|
Default learn_rate | 0.1 | 0.3 |
Default max_depth | 5 | 6 |
Default sample_rate | 1.0 | 1.0 |
| GPU acceleration | No | Yes (backend="gpu") |
| Distribution functions | More options (Huber, custom) | Most standard options |
| Categorical encoding | Native enum (no one-hot required) | Requires encoding |
| Custom distribution | Yes (custom_distribution_func) | No |
| Typical use case | General tabular; insurance, credit | Large datasets; competition-level accuracy |
GPU Acceleration for XGBoost
XGBoost in H2O-3 supports GPU-accelerated training via thehist tree method:
Python
H2O-3 automatically loads the most capable XGBoost native library available (GPU+OMP > OMP > single-CPU fallback). Set
backend="auto" to let H2O choose. GPU training requires tree_method="hist" or tree_method="auto".