Overview
Theinterpret module provides comprehensive model interpretation capabilities using SHAP values, permutation importance, and other feature importance metrics. It enables both global feature importance analysis and local (instance-level) explanations.
TreeInterpretation Class
Main class for interpreting tree-based models (e.g., RandomForest, GradientBoosting).Constructor
Parameters
Trained scikit-learn model instance (e.g., RandomForestClassifier)
Training dataset as pandas DataFrame
List of feature column names to use as independent variables
Name of target column to use as dependent variable
Random seed for reproducibility
Proportion of dataset to use for testing (0.0 to 1.0)
Optional preprocessing step to add to model pipeline (e.g., StandardScaler)
Attributes
After initialization, the model is automatically fitted:The fitted model instance
Model predictions on test set
Predicted probabilities for positive class
Training features
Test features
Training target values
Test target values
Global Feature Importance Methods
The class provides multiple property methods for computing feature importance:feature_importances
Standard scikit-learn feature importances (Gini/entropy-based).feature: Feature namefeature_importances_sklearn: Importance score
cv_importances
Cross-validated feature importances using k-fold CV.feature: Feature namecv_feature_importances: Cross-validated importance score
dropcol_importances
Importance computed by dropping each feature and measuring performance decrease.feature: Feature namedropcol_importances: Drop-column importance score
oob_dropcol_importances
Out-of-bag drop-column importances (for RandomForest only).feature: Feature nameoob_dropcol_importances: OOB drop-column importance score
permutation_importances
Permutation importance using stratified k-fold cross-validation.feature: Feature namepermutation_importance: Permutation importance score
feature_importance_permutation
Permutation importance using mlxtend library.feature: Feature namefeature_importance_permutation: Importance score from 10 rounds
shap
SHAP (SHapley Additive exPlanations) values for feature importance.feature: Feature nameshap: Mean absolute SHAP value
mutual_information
Mutual information between features and target.feature: Feature namemutual_information: MI score (0 = independent, higher = more dependent)
target_permutation
Feature importance via target permutation (MDI vs MDA).feature: Feature nameratio_mdi-mda: Ratio of Mean Decrease in Impurity to Mean Decrease in Accuracy
merge_feature_importances
Combines all feature importance methods into a single DataFrame.Local Explanation Methods
Explain individual predictions using SHAP values.local_explanation()
Generate local SHAP explanations for a specific transcript or gene.Parameters
DataFrame containing features for all samples, including gene_name and transcript_id columns
Either:
- Ensembl transcript ID (e.g., “ENST00000456328.2”)
- Gene name (e.g., “DDX11L1”)
If True, displays a waterfall plot for the transcript
Returns
For transcript ID: DataFrame with columns:- Index: Feature names
shap: SHAP value for each featurefeature: Feature value
- Rows: Feature names
- Columns: Transcript IDs within the gene
- Values: SHAP values
- Additional columns:
std(standard deviation),sum(sum across transcripts)
waterfall_plot()
Generate a SHAP waterfall plot for a specific sample.Complete Example
Feature Importance Comparison
Best Practices
- Use multiple importance methods: Different methods capture different aspects of feature importance
- SHAP for interpretation: SHAP values provide theoretically sound feature attributions
- Permutation for model-agnostic: Works with any model type
- Local explanations: Use for understanding individual predictions and debugging
- Cross-validation: CV-based methods provide more robust estimates