h2o.explain() for global explanations and h2o.explain_row() for row-level (local) explanations. Both work with individual models, lists of models, and AutoML objects.
Visualizations use ggplot2 in R and matplotlib in Python.
Overview
- Global explanations
- Local explanations
h2o.explain() generates a suite of explanations for a model or group of models against a holdout dataset.For a single model:- Confusion Matrix (classification only)
- Residual Analysis (regression only)
- Variable Importance
- Partial Dependence (PD) Plots
- Individual Conditional Expectation (ICE) Plots
- Leaderboard
- Variable Importance Heatmap
- Model Correlation Heatmap
- SHAP Summary of the top tree-based model
- Partial Dependence Multi-model Plots
Quick start
In R, the
H2OExplanation object does not print automatically when saved to a variable. Call print(exa) or simply type exa to render it. In Python, the explanations print immediately even when saved to a variable.Parameters
| Parameter | Description |
|---|---|
object (R) | An H2O model, AutoML object, list of models, or H2OFrame with a model_id column (e.g. AutoML leaderboard). |
newdata (R) / frame (Python) | Holdout H2OFrame used for residual analysis, SHAP, and other explanations. |
columns | Column names to include in column-based explanations (PDP, ICE, SHAP). When specified, top_n_features is ignored. |
top_n_features | Number of top features (ranked by variable importance) to include in column-based explanations. Defaults to 5. |
include_explanations | Generate only the listed explanation types. Mutually exclusive with exclude_explanations. |
exclude_explanations | Skip the listed explanation types. |
plot_overrides | Override arguments for individual plots (e.g., list(shap_summary_plot = list(top_n_features = 50))). |
row_index | Row to explain (0-based in Python, 1-based in R). Used with explain_row(). |
Variable importance
The variable importance plot shows the relative contribution of each feature to the model’s predictions.Variable importance is not currently available for Stacked Ensemble models. When using
explain() on an AutoML object with a Stacked Ensemble at the top, H2O-3 automatically shows variable importance for the top non-stacked base model instead.Variable importance heatmap
The variable importance heatmap compares feature importance across multiple models. Categorical columns that were one-hot encoded (e.g., for Deep Learning or XGBoost) are summarized back to the original column level. Models and variables are ordered by similarity.SHAP summary plots
SHAP (SHapley Additive exPlanations) summary plots show each feature’s contribution for every row in the holdout dataset. The sum of feature contributions and the bias term equals the model’s raw prediction before applying the inverse link function. H2O-3 uses TreeSHAP, which is exact for tree-based models. Note that when features are correlated, TreeSHAP may assign higher importance to a feature that had no direct influence on a prediction.Row-level SHAP
Useshap_explain_row_plot() to see feature contributions for a single prediction.
Partial dependence plots (PDP)
Partial dependence plots show the marginal effect of a single feature on the model’s predicted outcome, averaged over all other features. PDP assumes the target feature is independent of the remaining features.Individual Conditional Expectation (ICE) plots
ICE plots show the marginal effect of a feature for each individual instance, rather than the average across all instances. This reveals interaction effects that PDP can mask.Residual analysis
Residual analysis plots fitted values vs. residuals on the test dataset. Randomly distributed residuals indicate a well-specified model. Patterns may indicate missing non-linear effects, heteroscedasticity, or autocorrelation.If you see “striped” patterns in the residuals, this is typically caused by an integer-valued (rather than continuous) response variable, not a model issue.
Model correlation heatmap
The model correlation heatmap shows how similar the predictions of different models are. For classification, it measures the frequency of identical predictions. Models are ordered by hierarchical clustering of their similarity.Selecting specific explanations
Useinclude_explanations or exclude_explanations to control which explanations are generated.
Available explanation types:
| Name | Scope |
|---|---|
"leaderboard" | AutoML and list of models only |
"residual_analysis" | Regression only |
"confusion_matrix" | Classification only |
"varimp" | Not available for Stacked Ensembles |
"varimp_heatmap" | Multi-model only |
"model_correlation_heatmap" | Multi-model only |
"shap_summary" | Single models only |
"pdp" | All |
"ice" | All |