Interpretability Tools - UC Intel Final

Overview

The Interpretability page (/interpretability) provides tools to understand how and why your trained model makes predictions. Use Grad-CAM for visual attention, t-SNE for embedding visualization, and detailed misclassification analysis.

Interpretability tools require a completed experiment with a trained model. Complete training first before accessing this page.

Page Structure

Experiment Selection:

Dropdown at top to select completed experiment
Shows experiment name and model used

Five Tabs:

Architecture - Model structure review
Misclassifications - Analyze prediction errors
Embeddings - t-SNE visualization of learned features
Grad-CAM - Visual attention heatmaps
Advanced - Additional interpretability techniques

Tab 1: Architecture Review

Review the model architecture used in the selected experiment.

Model Summary

Displays:

Model Name: From model library
Model Type: Custom CNN, Transformer, or Transfer Learning
Total Parameters: Trainable + non-trainable
Trainable Parameters: Updated during training
Non-Trainable Parameters: Frozen weights (transfer learning)

Layer-by-Layer Breakdown

Custom CNN
Transfer Learning
Transformer

Convolutional Blocks:

Layer name (e.g., “Conv2D_1”)
Filters, kernel size, activation
Batch normalization status
Pooling type and size
Dropout rate

Dense Layers:

Units and activation
Dropout rate

Output Layer:

Number of classes
Softmax activation

Use this tab to verify the exact architecture that was trained, especially when comparing multiple experiments.

Tab 2: Misclassifications

Analyze prediction errors to understand model weaknesses.

Error Type Filter

Select which misclassifications to view:

All Errors: Every incorrect prediction
By True Class: Filter errors for specific malware family
By Predicted Class: Filter by what the model predicted
Confidence Threshold: Show only high-confidence errors (model was confident but wrong)

Misclassification Gallery

Displays misclassified images in a grid: Each image shows:

Original Image: Misclassified sample
True Label: Actual malware family (green)
Predicted Label: Model’s prediction (red)
Confidence: Softmax probability for predicted class
Top-3 Predictions: Model’s top 3 choices with probabilities

Example:

Image: malware_sample_1234.png
True: Ramnit
Predicted: Lollipop (85% confidence)
Top-3:
  1. Lollipop: 85%
  2. Ramnit: 12%
  3. Kelihos: 3%

High-confidence errors (>80%) are most concerning. They indicate systematic confusion that the model is confident about, suggesting visual similarity or data issues.

Error Analysis Summary

Metrics displayed:

Total Misclassifications: Count of errors
Error Rate: Percentage of test set misclassified
Most Confused Pair: Which two classes are most often swapped
Worst Performing Class: Class with lowest recall

Use misclassification analysis to guide data collection. If specific pairs are confused, collect more distinguishing samples or increase augmentation for those classes.

Tab 3: Embeddings

Visualize learned feature representations using dimensionality reduction.

t-SNE Visualization

What is t-SNE?

t-Distributed Stochastic Neighbor Embedding
Projects high-dimensional features to 2D for visualization
Preserves local structure (similar samples cluster together)

Chart Display:

Each point: One test sample
Color: True class label
Position: 2D projection of learned features
Clusters: Samples from same class should cluster

Embeddings are extracted from the second-to-last layer (before softmax), representing the model’s learned feature space.

Interpreting t-SNE Plots

Well-Separated Clusters
Overlapping Clusters
Scattered Points

Good sign:

Each class forms distinct cluster
Minimal overlap between classes
Clear boundaries

Indicates: Model learned discriminative featuresExample: Ramnit cluster far from Lollipop cluster

Hover over points to see sample details: filename, true label, predicted label, confidence.

Configuration Options

t-SNE Parameters:

Perplexity: 5-50 (default: 30)
- Higher = considers more neighbors
- Lower = focuses on local structure
Learning Rate: 10-1000 (default: 200)
Iterations: 250-5000 (default: 1000)

t-SNE is non-deterministic. Running multiple times produces different layouts, but cluster structure should remain consistent.

Tab 4: Grad-CAM

Gradient-weighted Class Activation Mapping - visualize where the model “looks” when making predictions.

What is Grad-CAM?

Grad-CAM uses gradients flowing into the last convolutional layer to produce a heatmap showing which regions of the image contributed most to the prediction.

Red areas: High importance (model focuses here)
Blue areas: Low importance (model ignores)
Overlay on image: Shows attention directly on input

Interface

Select Sample

True Class Dropdown: Filter by actual malware family
Sample Selector: Choose specific image from class
Prediction/Correct Filter: Show only correct or incorrect predictions

View Visualization

Three-panel display:

Original Image: Input image
Grad-CAM Heatmap: Attention heatmap (red = important)
Overlay: Heatmap superimposed on image

Analyze Attention

Prediction: Model’s predicted class and confidence
True Label: Actual class
Top-3 Predictions: Alternative predictions with confidences

Interpreting Grad-CAM Heatmaps

Correct Predictions
Incorrect Predictions
Attention on Artifacts

Good attention:

Heatmap highlights relevant image regions
Consistent patterns across samples from same class
Focuses on discriminative features

Example: For Ramnit malware, model focuses on characteristic header structure

If Grad-CAM shows consistent attention on non-malware features (e.g., image borders, watermarks), your model may have learned dataset biases instead of malware characteristics.

Grad-CAM Options

Layer Selection:

Last Conv Layer (default): Broadest semantic understanding
Earlier Layers: More localized, fine-grained attention

Colormap:

Jet: Red (important) to blue (unimportant)
Viridis: Purple to yellow
Hot: Black to red to white

Compare Grad-CAM across correctly and incorrectly classified samples to identify what the model attends to in each case.

Tab 5: Advanced

Additional interpretability techniques.

Feature Importance

For Custom CNNs:

Filter Visualizations: What patterns each convolutional filter detects
Activation Maximization: Synthetic images that maximally activate specific neurons

Attention Rollout (Transformers Only)

For Vision Transformers:

Attention Weights: Which patches the model attends to
Rollout Visualization: Aggregated attention across all layers
Per-Head Analysis: Different attention heads focus on different features

Saliency Maps

Gradient-based saliency:

Compute gradient of output w.r.t. input pixels
Shows which pixels, if changed, would most affect prediction
Finer-grained than Grad-CAM

Integrated Gradients

Path-based attribution:

Computes gradient along path from baseline to input
More accurate attribution than simple gradients
Shows pixel-level importance

Advanced techniques provide deeper insights but require more computation. Start with Grad-CAM and t-SNE for initial analysis.

Use Cases

Debugging Poor Performance

Check Misclassifications

Identify which classes are confused

View Embeddings

Confirm if confused classes overlap in feature space

Analyze Grad-CAM

Verify model focuses on relevant features, not artifacts

Review Architecture

Ensure model has sufficient capacity for task

Validating Model Behavior

Grad-CAM on Correct Predictions

Verify model attends to malware-specific regions

Embeddings Clustering

Confirm classes are well-separated

Error Analysis

Check that errors make sense (confused classes are actually similar)

Identifying Data Issues

Grad-CAM Artifact Detection

Look for consistent attention on watermarks or borders

Outlier Detection in t-SNE

Find scattered points far from class cluster (potential mislabels)

High-Confidence Errors

Investigate samples the model is confident about but wrong (data quality?)

Tips & Best Practices

Start with t-SNE: Quick overview of whether model learned separable features.

Use Grad-CAM for Debugging: If performance is poor, check if model focuses on relevant features.

Analyze Errors First: Understanding misclassifications is more valuable than confirming correct predictions.

Grad-CAM highlights correlations, not causation. High attention doesn’t mean that region caused the prediction, only that it correlates.

Cross-Reference Tools: Use multiple techniques together (e.g., t-SNE shows overlap → Grad-CAM shows why → Misclassifications show which samples).

Limitations

Grad-CAM

Only works with CNNs (requires convolutional layers)
Coarse spatial resolution
May miss fine-grained details
Alternative: Integrated Gradients for finer detail

t-SNE

Non-deterministic (different runs produce different layouts)
Computationally expensive for large datasets
Hyperparameter-sensitive (perplexity affects structure)
Alternative: UMAP for faster, deterministic results

Attention for Transformers

Attention weights ≠ importance (attention is not explanation)
Multiple heads may attend to different features
Requires specialized visualization tools

Summary

The Interpretability page provides essential tools for understanding your trained model:

Architecture Review

Verify exact model structure and parameters

Misclassifications

Identify and analyze prediction errors

Embeddings (t-SNE)

Visualize learned feature space clustering

Grad-CAM

See where the model focuses attention

Use these tools to:

Validate model behavior
Debug poor performance
Identify data quality issues
Understand prediction reasoning

Interpretability is crucial for deploying ML models in production. Always validate that your model makes predictions for the right reasons, not spurious correlations.

Get Started

Core Concepts

Dashboard Guide

Training

Model Interpretability

​Overview

​Page Structure

​Tab 1: Architecture Review

​Model Summary

​Layer-by-Layer Breakdown

​Tab 2: Misclassifications

​Error Type Filter

​Misclassification Gallery

​Error Analysis Summary

​Tab 3: Embeddings

​t-SNE Visualization

​Interpreting t-SNE Plots

​Configuration Options

​Tab 4: Grad-CAM

​What is Grad-CAM?

​Interface

​Interpreting Grad-CAM Heatmaps

​Grad-CAM Options

​Tab 5: Advanced

​Feature Importance

​Attention Rollout (Transformers Only)

​Saliency Maps

​Integrated Gradients

​Use Cases

​Debugging Poor Performance

​Validating Model Behavior

​Identifying Data Issues

​Tips & Best Practices

​Limitations

​Grad-CAM

​t-SNE

​Attention for Transformers

​Summary

Architecture Review

Misclassifications

Embeddings (t-SNE)

Grad-CAM

Build docs developers (and LLMs) love

Overview

Page Structure

Tab 1: Architecture Review

Model Summary

Layer-by-Layer Breakdown

Tab 2: Misclassifications

Error Type Filter

Misclassification Gallery

Error Analysis Summary

Tab 3: Embeddings

t-SNE Visualization

Interpreting t-SNE Plots

Configuration Options

Tab 4: Grad-CAM

What is Grad-CAM?

Interface

Interpreting Grad-CAM Heatmaps

Grad-CAM Options

Tab 5: Advanced

Feature Importance

Attention Rollout (Transformers Only)

Saliency Maps

Integrated Gradients

Use Cases

Debugging Poor Performance

Validating Model Behavior

Identifying Data Issues

Tips & Best Practices

Limitations

Grad-CAM

t-SNE

Attention for Transformers

Summary