Skip to main content

Overview

The Interpretability page (/interpretability) provides tools to understand how and why your trained model makes predictions. Use Grad-CAM for visual attention, t-SNE for embedding visualization, and detailed misclassification analysis.
Interpretability tools require a completed experiment with a trained model. Complete training first before accessing this page.

Page Structure

Experiment Selection:
  • Dropdown at top to select completed experiment
  • Shows experiment name and model used
Five Tabs:
  1. Architecture - Model structure review
  2. Misclassifications - Analyze prediction errors
  3. Embeddings - t-SNE visualization of learned features
  4. Grad-CAM - Visual attention heatmaps
  5. Advanced - Additional interpretability techniques

Tab 1: Architecture Review

Review the model architecture used in the selected experiment.

Model Summary

Displays:
  • Model Name: From model library
  • Model Type: Custom CNN, Transformer, or Transfer Learning
  • Total Parameters: Trainable + non-trainable
  • Trainable Parameters: Updated during training
  • Non-Trainable Parameters: Frozen weights (transfer learning)

Layer-by-Layer Breakdown

Convolutional Blocks:
  • Layer name (e.g., “Conv2D_1”)
  • Filters, kernel size, activation
  • Batch normalization status
  • Pooling type and size
  • Dropout rate
Dense Layers:
  • Units and activation
  • Dropout rate
Output Layer:
  • Number of classes
  • Softmax activation
Use this tab to verify the exact architecture that was trained, especially when comparing multiple experiments.

Tab 2: Misclassifications

Analyze prediction errors to understand model weaknesses.

Error Type Filter

Select which misclassifications to view:
  • All Errors: Every incorrect prediction
  • By True Class: Filter errors for specific malware family
  • By Predicted Class: Filter by what the model predicted
  • Confidence Threshold: Show only high-confidence errors (model was confident but wrong)
Displays misclassified images in a grid: Each image shows:
  • Original Image: Misclassified sample
  • True Label: Actual malware family (green)
  • Predicted Label: Model’s prediction (red)
  • Confidence: Softmax probability for predicted class
  • Top-3 Predictions: Model’s top 3 choices with probabilities
Example:
Image: malware_sample_1234.png
True: Ramnit
Predicted: Lollipop (85% confidence)
Top-3:
  1. Lollipop: 85%
  2. Ramnit: 12%
  3. Kelihos: 3%
High-confidence errors (>80%) are most concerning. They indicate systematic confusion that the model is confident about, suggesting visual similarity or data issues.

Error Analysis Summary

Metrics displayed:
  • Total Misclassifications: Count of errors
  • Error Rate: Percentage of test set misclassified
  • Most Confused Pair: Which two classes are most often swapped
  • Worst Performing Class: Class with lowest recall
Use misclassification analysis to guide data collection. If specific pairs are confused, collect more distinguishing samples or increase augmentation for those classes.

Tab 3: Embeddings

Visualize learned feature representations using dimensionality reduction.

t-SNE Visualization

What is t-SNE?
  • t-Distributed Stochastic Neighbor Embedding
  • Projects high-dimensional features to 2D for visualization
  • Preserves local structure (similar samples cluster together)
Chart Display:
  • Each point: One test sample
  • Color: True class label
  • Position: 2D projection of learned features
  • Clusters: Samples from same class should cluster
Embeddings are extracted from the second-to-last layer (before softmax), representing the model’s learned feature space.

Interpreting t-SNE Plots

Good sign:
  • Each class forms distinct cluster
  • Minimal overlap between classes
  • Clear boundaries
Indicates: Model learned discriminative featuresExample: Ramnit cluster far from Lollipop cluster
Hover over points to see sample details: filename, true label, predicted label, confidence.

Configuration Options

t-SNE Parameters:
  • Perplexity: 5-50 (default: 30)
    • Higher = considers more neighbors
    • Lower = focuses on local structure
  • Learning Rate: 10-1000 (default: 200)
  • Iterations: 250-5000 (default: 1000)
t-SNE is non-deterministic. Running multiple times produces different layouts, but cluster structure should remain consistent.

Tab 4: Grad-CAM

Gradient-weighted Class Activation Mapping - visualize where the model “looks” when making predictions.

What is Grad-CAM?

Grad-CAM uses gradients flowing into the last convolutional layer to produce a heatmap showing which regions of the image contributed most to the prediction.
  • Red areas: High importance (model focuses here)
  • Blue areas: Low importance (model ignores)
  • Overlay on image: Shows attention directly on input

Interface

1

Select Sample

  • True Class Dropdown: Filter by actual malware family
  • Sample Selector: Choose specific image from class
  • Prediction/Correct Filter: Show only correct or incorrect predictions
2

View Visualization

Three-panel display:
  1. Original Image: Input image
  2. Grad-CAM Heatmap: Attention heatmap (red = important)
  3. Overlay: Heatmap superimposed on image
3

Analyze Attention

  • Prediction: Model’s predicted class and confidence
  • True Label: Actual class
  • Top-3 Predictions: Alternative predictions with confidences

Interpreting Grad-CAM Heatmaps

Good attention:
  • Heatmap highlights relevant image regions
  • Consistent patterns across samples from same class
  • Focuses on discriminative features
Example: For Ramnit malware, model focuses on characteristic header structure
If Grad-CAM shows consistent attention on non-malware features (e.g., image borders, watermarks), your model may have learned dataset biases instead of malware characteristics.

Grad-CAM Options

Layer Selection:
  • Last Conv Layer (default): Broadest semantic understanding
  • Earlier Layers: More localized, fine-grained attention
Colormap:
  • Jet: Red (important) to blue (unimportant)
  • Viridis: Purple to yellow
  • Hot: Black to red to white
Compare Grad-CAM across correctly and incorrectly classified samples to identify what the model attends to in each case.

Tab 5: Advanced

Additional interpretability techniques.

Feature Importance

For Custom CNNs:
  • Filter Visualizations: What patterns each convolutional filter detects
  • Activation Maximization: Synthetic images that maximally activate specific neurons

Attention Rollout (Transformers Only)

For Vision Transformers:
  • Attention Weights: Which patches the model attends to
  • Rollout Visualization: Aggregated attention across all layers
  • Per-Head Analysis: Different attention heads focus on different features

Saliency Maps

Gradient-based saliency:
  • Compute gradient of output w.r.t. input pixels
  • Shows which pixels, if changed, would most affect prediction
  • Finer-grained than Grad-CAM

Integrated Gradients

Path-based attribution:
  • Computes gradient along path from baseline to input
  • More accurate attribution than simple gradients
  • Shows pixel-level importance
Advanced techniques provide deeper insights but require more computation. Start with Grad-CAM and t-SNE for initial analysis.

Use Cases

Debugging Poor Performance

1

Check Misclassifications

Identify which classes are confused
2

View Embeddings

Confirm if confused classes overlap in feature space
3

Analyze Grad-CAM

Verify model focuses on relevant features, not artifacts
4

Review Architecture

Ensure model has sufficient capacity for task

Validating Model Behavior

1

Grad-CAM on Correct Predictions

Verify model attends to malware-specific regions
2

Embeddings Clustering

Confirm classes are well-separated
3

Error Analysis

Check that errors make sense (confused classes are actually similar)

Identifying Data Issues

1

Grad-CAM Artifact Detection

Look for consistent attention on watermarks or borders
2

Outlier Detection in t-SNE

Find scattered points far from class cluster (potential mislabels)
3

High-Confidence Errors

Investigate samples the model is confident about but wrong (data quality?)

Tips & Best Practices

Start with t-SNE: Quick overview of whether model learned separable features.
Use Grad-CAM for Debugging: If performance is poor, check if model focuses on relevant features.
Analyze Errors First: Understanding misclassifications is more valuable than confirming correct predictions.
Grad-CAM highlights correlations, not causation. High attention doesn’t mean that region caused the prediction, only that it correlates.
Cross-Reference Tools: Use multiple techniques together (e.g., t-SNE shows overlap → Grad-CAM shows why → Misclassifications show which samples).

Limitations

Grad-CAM

  • Only works with CNNs (requires convolutional layers)
  • Coarse spatial resolution
  • May miss fine-grained details
  • Alternative: Integrated Gradients for finer detail

t-SNE

  • Non-deterministic (different runs produce different layouts)
  • Computationally expensive for large datasets
  • Hyperparameter-sensitive (perplexity affects structure)
  • Alternative: UMAP for faster, deterministic results

Attention for Transformers

  • Attention weights ≠ importance (attention is not explanation)
  • Multiple heads may attend to different features
  • Requires specialized visualization tools

Summary

The Interpretability page provides essential tools for understanding your trained model:

Architecture Review

Verify exact model structure and parameters

Misclassifications

Identify and analyze prediction errors

Embeddings (t-SNE)

Visualize learned feature space clustering

Grad-CAM

See where the model focuses attention
Use these tools to:
  • Validate model behavior
  • Debug poor performance
  • Identify data quality issues
  • Understand prediction reasoning
Interpretability is crucial for deploying ML models in production. Always validate that your model makes predictions for the right reasons, not spurious correlations.

Build docs developers (and LLMs) love