Overview
The Interpretability page (/interpretability) provides tools to understand how and why your trained model makes predictions. Use Grad-CAM for visual attention, t-SNE for embedding visualization, and detailed misclassification analysis.
Interpretability tools require a completed experiment with a trained model. Complete training first before accessing this page.
Page Structure
Experiment Selection:- Dropdown at top to select completed experiment
- Shows experiment name and model used
- Architecture - Model structure review
- Misclassifications - Analyze prediction errors
- Embeddings - t-SNE visualization of learned features
- Grad-CAM - Visual attention heatmaps
- Advanced - Additional interpretability techniques
Tab 1: Architecture Review
Review the model architecture used in the selected experiment.Model Summary
Displays:- Model Name: From model library
- Model Type: Custom CNN, Transformer, or Transfer Learning
- Total Parameters: Trainable + non-trainable
- Trainable Parameters: Updated during training
- Non-Trainable Parameters: Frozen weights (transfer learning)
Layer-by-Layer Breakdown
- Custom CNN
- Transfer Learning
- Transformer
Convolutional Blocks:
- Layer name (e.g., “Conv2D_1”)
- Filters, kernel size, activation
- Batch normalization status
- Pooling type and size
- Dropout rate
- Units and activation
- Dropout rate
- Number of classes
- Softmax activation
Use this tab to verify the exact architecture that was trained, especially when comparing multiple experiments.
Tab 2: Misclassifications
Analyze prediction errors to understand model weaknesses.Error Type Filter
Select which misclassifications to view:- All Errors: Every incorrect prediction
- By True Class: Filter errors for specific malware family
- By Predicted Class: Filter by what the model predicted
- Confidence Threshold: Show only high-confidence errors (model was confident but wrong)
Misclassification Gallery
Displays misclassified images in a grid: Each image shows:- Original Image: Misclassified sample
- True Label: Actual malware family (green)
- Predicted Label: Model’s prediction (red)
- Confidence: Softmax probability for predicted class
- Top-3 Predictions: Model’s top 3 choices with probabilities
Error Analysis Summary
Metrics displayed:- Total Misclassifications: Count of errors
- Error Rate: Percentage of test set misclassified
- Most Confused Pair: Which two classes are most often swapped
- Worst Performing Class: Class with lowest recall
Use misclassification analysis to guide data collection. If specific pairs are confused, collect more distinguishing samples or increase augmentation for those classes.
Tab 3: Embeddings
Visualize learned feature representations using dimensionality reduction.t-SNE Visualization
What is t-SNE?- t-Distributed Stochastic Neighbor Embedding
- Projects high-dimensional features to 2D for visualization
- Preserves local structure (similar samples cluster together)
- Each point: One test sample
- Color: True class label
- Position: 2D projection of learned features
- Clusters: Samples from same class should cluster
Embeddings are extracted from the second-to-last layer (before softmax), representing the model’s learned feature space.
Interpreting t-SNE Plots
- Well-Separated Clusters
- Overlapping Clusters
- Scattered Points
Good sign:
- Each class forms distinct cluster
- Minimal overlap between classes
- Clear boundaries
Configuration Options
t-SNE Parameters:- Perplexity: 5-50 (default: 30)
- Higher = considers more neighbors
- Lower = focuses on local structure
- Learning Rate: 10-1000 (default: 200)
- Iterations: 250-5000 (default: 1000)
Tab 4: Grad-CAM
Gradient-weighted Class Activation Mapping - visualize where the model “looks” when making predictions.What is Grad-CAM?
Grad-CAM uses gradients flowing into the last convolutional layer to produce a heatmap showing which regions of the image contributed most to the prediction.- Red areas: High importance (model focuses here)
- Blue areas: Low importance (model ignores)
- Overlay on image: Shows attention directly on input
Interface
Select Sample
- True Class Dropdown: Filter by actual malware family
- Sample Selector: Choose specific image from class
- Prediction/Correct Filter: Show only correct or incorrect predictions
View Visualization
Three-panel display:
- Original Image: Input image
- Grad-CAM Heatmap: Attention heatmap (red = important)
- Overlay: Heatmap superimposed on image
Interpreting Grad-CAM Heatmaps
- Correct Predictions
- Incorrect Predictions
- Attention on Artifacts
Good attention:
- Heatmap highlights relevant image regions
- Consistent patterns across samples from same class
- Focuses on discriminative features
Grad-CAM Options
Layer Selection:- Last Conv Layer (default): Broadest semantic understanding
- Earlier Layers: More localized, fine-grained attention
- Jet: Red (important) to blue (unimportant)
- Viridis: Purple to yellow
- Hot: Black to red to white
Tab 5: Advanced
Additional interpretability techniques.Feature Importance
For Custom CNNs:- Filter Visualizations: What patterns each convolutional filter detects
- Activation Maximization: Synthetic images that maximally activate specific neurons
Attention Rollout (Transformers Only)
For Vision Transformers:- Attention Weights: Which patches the model attends to
- Rollout Visualization: Aggregated attention across all layers
- Per-Head Analysis: Different attention heads focus on different features
Saliency Maps
Gradient-based saliency:- Compute gradient of output w.r.t. input pixels
- Shows which pixels, if changed, would most affect prediction
- Finer-grained than Grad-CAM
Integrated Gradients
Path-based attribution:- Computes gradient along path from baseline to input
- More accurate attribution than simple gradients
- Shows pixel-level importance
Advanced techniques provide deeper insights but require more computation. Start with Grad-CAM and t-SNE for initial analysis.
Use Cases
Debugging Poor Performance
Validating Model Behavior
Identifying Data Issues
Tips & Best Practices
Limitations
Grad-CAM
- Only works with CNNs (requires convolutional layers)
- Coarse spatial resolution
- May miss fine-grained details
- Alternative: Integrated Gradients for finer detail
t-SNE
- Non-deterministic (different runs produce different layouts)
- Computationally expensive for large datasets
- Hyperparameter-sensitive (perplexity affects structure)
- Alternative: UMAP for faster, deterministic results
Attention for Transformers
- Attention weights ≠ importance (attention is not explanation)
- Multiple heads may attend to different features
- Requires specialized visualization tools
Summary
The Interpretability page provides essential tools for understanding your trained model:Architecture Review
Verify exact model structure and parameters
Misclassifications
Identify and analyze prediction errors
Embeddings (t-SNE)
Visualize learned feature space clustering
Grad-CAM
See where the model focuses attention
- Validate model behavior
- Debug poor performance
- Identify data quality issues
- Understand prediction reasoning
Interpretability is crucial for deploying ML models in production. Always validate that your model makes predictions for the right reasons, not spurious correlations.