sample_rate and mtries). The final prediction averages all trees, reducing variance without increasing bias.
DRF supports both classification (majority vote) and regression (average prediction), per-row observation weights, N-fold cross-validation, and produces calibrated probabilities.
MOJO Support: DRF fully supports importing and exporting MOJOs.
Extremely Randomized Trees (XRT)
H2O-3 supports Extremely Randomized Trees viahistogram_type="Random". Unlike standard DRF where the best threshold is found for each candidate feature, XRT draws split thresholds at random, further reducing variance at the cost of a small bias increase. Enable it by setting:
Key Parameters
Number of trees in the forest. More trees reduce variance but increase training time. Unlike GBM, DRF does not overfit badly with more trees — test accuracy plateaus rather than degrades.
Maximum tree depth. DRF uses deeper trees than GBM by default (
20 vs 5) because averaging reduces the overfitting risk of individual deep trees.Number of columns randomly sampled at each split.
-1 uses the square root of the number of columns for classification and p/3 for regression (standard random forest defaults). -2 uses all features (equivalent to bagging). Any value >= 1 sets the exact count.Row subsampling rate per tree (without replacement). The default
0.632 is the classic bootstrap approximation. Reducing this value adds more randomness and can improve generalization on noisy data.Minimum observations required in a leaf node. Increase to smooth predictions on small datasets. R parameter name:
node_size.Number of histogram bins for numerical columns. More bins capture finer split points but increase memory and training time.
Maximum bins for categorical columns. Higher values allow finer splits on high-cardinality categoricals.
Split-finding strategy:
"AUTO", "UniformAdaptive", "UniformRobust", "Random" (XRT), "QuantilesGlobal", "RoundRobin".Binary classification only. Build twice as many trees (one per class). Can improve accuracy at the cost of doubled training time.
Code Examples
Variable Importance
DRF computes variable importance based on mean decrease in squared error across all splits that use each feature. Access it as follows:Python
R
DRF vs GBM
| Property | DRF | GBM |
|---|---|---|
| Training strategy | Parallel (bagging) | Sequential (boosting) |
Default max_depth | 20 | 5 |
Default sample_rate | 0.632 | 1.0 |
| Overfitting risk with many trees | Low (variance reduces) | Moderate (use early stopping) |
| Tuning effort | Low | Higher |
| Typical AUC | Good baseline | Slightly higher ceiling |
| Training time | Faster (fully parallel) | Slower (sequential) |