Skip to main content

Overview

Activation maps reveal what patterns and features convolutional filters detect in input images. By visualizing filter responses at different layers, you can understand the hierarchical feature learning process — from simple edges in early layers to complex semantic concepts in deeper layers.
Activation maps show the raw filter responses without gradient weighting, providing direct insight into what each filter detects.

How It Works

Activation visualization involves:
  1. Forward Pass: Process an image through the model
  2. Hook Registration: Capture intermediate activations at target layer
  3. Extraction: Retrieve activation maps (num_filters × H × W)
  4. Visualization: Display individual filter responses as grayscale images
Bright regions in activation maps indicate strong filter responses — the filter detected its target pattern.

Feature Hierarchy

CNNs learn features hierarchically:

Early Layers

Layer 1-2Detect low-level features:
  • Edges and contours
  • Colors and gradients
  • Simple textures
  • Corner patterns

Middle Layers

Layer 3-4Detect mid-level features:
  • Textures and patterns
  • Simple shapes
  • Repeated structures
  • Object parts

Deep Layers

Layer 5+Detect high-level features:
  • Object concepts
  • Semantic features
  • Complex patterns
  • Class-specific signatures

Implementation

The UC Intel Final platform provides clean activation extraction:

Activation Extraction

def get_activation_maps(
    model: nn.Module,
    device: torch.device,
    image_tensor: torch.Tensor,
    target_layer: nn.Module,
) -> np.ndarray:
    """
    Get activation maps for a specific layer.

    Args:
        model: The model
        device: Compute device
        image_tensor: (C, H, W) image tensor
        target_layer: Layer to visualize

    Returns:
        Activation maps as (num_filters, H, W) numpy array
    """
    activations = []

    def hook_fn(_module, _inp, output):
        activations.append(output.detach().cpu())

    handle = target_layer.register_forward_hook(hook_fn)

    try:
        model.eval()
        with torch.no_grad():
            input_tensor = image_tensor.unsqueeze(0).to(device)
            _ = model(input_tensor)

        if not activations:
            return np.array([])

        acts = activations[0].squeeze(0).numpy()
        return acts

    finally:
        handle.remove()
The function automatically cleans up hooks to prevent memory leaks and unwanted side effects.

Key Design Choices

Detach and CPU Transfer:
activations.append(output.detach().cpu())
  • detach(): Removes from computation graph, prevents gradient tracking
  • cpu(): Transfers to CPU for visualization and analysis
No Gradient Tracking:
with torch.no_grad():
    input_tensor = image_tensor.unsqueeze(0).to(device)
    _ = model(input_tensor)
  • Faster computation
  • Lower memory usage
  • No gradient accumulation
Guaranteed Cleanup:
finally:
    handle.remove()
  • Hook removed even if errors occur
  • Prevents hooks from accumulating
  • Essential for repeated visualizations

Filter Weight Visualization

Besides activation maps, you can visualize the actual learned filter weights:
def get_filter_weights(layer: nn.Module) -> np.ndarray | None:
    """
    Get convolutional filter weights for visualization.

    Args:
        layer: Conv2d layer

    Returns:
        Filter weights as (out_channels, in_channels, kH, kW) numpy array
    """
    if not isinstance(layer, nn.Conv2d):
        return None

    weights = layer.weight.detach().cpu().numpy()
    return weights

Weight Normalization for Display

def normalize_filter_for_display(filter_weights: np.ndarray) -> np.ndarray:
    """Normalize filter weights to [0, 1] for display."""
    min_val = filter_weights.min()
    max_val = filter_weights.max()
    if max_val - min_val > 0:
        return (filter_weights - min_val) / (max_val - min_val)
    return np.zeros_like(filter_weights)
Filter weights typically have:
  • Different scales: Some filters have large weights, others small
  • Positive and negative values: Can’t directly visualize as images
  • Varying ranges: Makes comparison difficult
Normalization benefits:
  • Brings all weights to [0, 1] range
  • Makes visualization consistent
  • Highlights relative patterns within each filter
  • Enables meaningful visual comparison

Understanding Activations

Activation Patterns

Strong Activation (Bright Regions):
  • Filter detected its target feature
  • Pattern present in the image at that location
  • Important for downstream processing
Weak Activation (Dark Regions):
  • Target feature absent or weak
  • Filter not responsive to local pattern
  • Less information passed forward
Sparse Activations:
  • Filter highly selective (good)
  • Responds only to specific patterns
  • Common in well-trained networks
Dense Activations:
  • Filter responds broadly (may need more training)
  • Less discriminative
  • Common in early layers or undertrained networks

Layer-Specific Insights

Characteristics:
  • Input: Raw RGB pixels (3 channels)
  • Typical filters: 32-64 filters, 3×3 or 5×5
  • Patterns detected: Edges, color gradients, simple textures
What to look for:
  • Edge detectors at various orientations (0°, 45°, 90°, 135°)
  • Color-sensitive filters (red edges, blue gradients, etc.)
  • Texture patterns (dots, lines, grids)
Example interpretations:
  • Horizontal edge detector → bright on horizontal boundaries
  • Red channel filter → bright on reddish regions
  • Gabor-like filter → bright on oriented textures
Characteristics:
  • Input: Feature maps from previous layer (64-256 channels)
  • Typical filters: 128-512 filters
  • Patterns detected: Combinations of low-level features, textures, shapes
What to look for:
  • Repeated texture patterns
  • Corner and junction detectors
  • Simple shape components
  • Directional patterns
Example interpretations:
  • Grid pattern detector → bright on regular structures
  • Blob detector → bright on rounded regions
  • Texture discriminator → bright on specific surface types
Characteristics:
  • Input: High-level features (256-2048 channels)
  • Typical filters: 512-2048 filters
  • Patterns detected: Semantic concepts, object parts, class-specific features
What to look for:
  • Semantic object detectors
  • Class-discriminative patterns
  • Global structure indicators
  • Invariant feature representations
Example interpretations:
  • Malware signature detector → bright on suspicious patterns
  • Benign indicator → bright on clean code structures
  • Family-specific detector → bright on variant signatures

Visualization Best Practices

Selecting Layers

For general understanding:
  • Visualize first layer (low-level features)
  • Visualize one middle layer (mid-level patterns)
  • Visualize final conv layer (high-level concepts)
For debugging:
  • Compare activations for correct vs incorrect predictions
  • Check if deep layers activate on relevant regions
  • Verify early layers respond to expected patterns
For model comparison:
  • Same layer across different architectures
  • Same layer across training epochs
  • Same layer for different regularization settings

Interpreting Filter Response

Dead Filters

Problem: Filter always produces near-zero activationsCauses:
  • Bad initialization
  • Learning rate too high
  • ReLU dying neuron problem
Solutions:
  • Check initialization scheme
  • Reduce learning rate
  • Use LeakyReLU or other activation

Saturated Filters

Problem: Filter always produces maximum activationsCauses:
  • Weights too large
  • Input not normalized
  • Gradient explosion
Solutions:
  • Apply weight decay
  • Normalize inputs
  • Gradient clipping

Advanced Analysis

Comparing Activations

Correct vs Misclassified Samples:
# Pseudocode for comparative analysis
correct_activations = get_activation_maps(model, device, correct_image, layer)
incorrect_activations = get_activation_maps(model, device, incorrect_image, layer)

# Analyze differences
activation_diff = correct_activations - incorrect_activations
Insights from differences:
  • Filters that fire differently → discriminative for the class
  • Filters with similar response → class-agnostic features
  • Large differences in deep layers → high-level feature mismatch

Receptive Field Understanding

Activation map size decreases with depth: Layer 1 (e.g., 112×112):
  • Small receptive field (~3×3 pixels)
  • Each activation corresponds to local region
Layer 3 (e.g., 28×28):
  • Medium receptive field (~20×20 pixels)
  • Each activation sees larger context
Layer 5 (e.g., 7×7):
  • Large receptive field (~100×100 pixels)
  • Each activation sees most of the image
The receptive field determines how much context each filter can “see” — deeper layers integrate information from larger regions.

Relationship to Grad-CAM

Activation Maps vs Grad-CAM:
AspectActivation MapsGrad-CAM
WhatRaw filter responsesGradient-weighted responses
ShowsWhat each filter detectsWhat influences the prediction
FocusIndividual filtersClass-specific importance
Use caseUnderstanding learned featuresExplaining predictions
OutputMany maps (one per filter)Single heatmap
Use activation maps to understand what the model learned, and Grad-CAM to understand why it made a specific prediction.

Practical Usage Tips

Start Simple

  1. First layer: Verify edge detection and basic feature extraction
  2. Mid layer: Check for reasonable texture and pattern learning
  3. Last conv layer: Ensure semantic feature discrimination

Look for Problems

Warning signs:
  • All filters look identical (convergence failure)
  • No activations on test images (dead network)
  • Noisy, random-looking filters (undertrained)
  • Activations on irrelevant regions (spurious correlations)
Good signs:
  • Diverse filter patterns
  • Interpretable low-level features
  • Sparse but meaningful activations
  • Deep filters respond to class-relevant patterns

Combine with Other Methods

With Grad-CAM

Check if important regions (Grad-CAM) have strong activations

With t-SNE

Understand which activations contribute to embedding space structure

Technical Considerations

Memory Usage

Activation maps can be large:
  • Layer with 512 filters at 28×28 resolution: ~400KB per image
  • Displaying 100 filters: ~40MB
Mitigation:
  • Visualize subset of filters (e.g., top 50)
  • Downsample large activation maps
  • Process one image at a time

Computational Cost

Extraction is fast:
  • Forward pass only (no gradients)
  • Minimal overhead from hooks
  • Can process many samples quickly

Hook Management

Important: Always remove hooks after use:
try:
    # ... activation extraction ...
finally:
    handle.remove()  # Always cleanup
Why?
  • Accumulating hooks slows down model
  • Causes memory leaks
  • Can interfere with training

Use Cases in Malware Analysis

Understanding Family Signatures

  • Visualize activations for multiple samples from same family
  • Identify consistently activated filters
  • Discover discriminative patterns for that family

Detecting Adversarial Examples

  • Compare activations of normal vs adversarial samples
  • Adversarial perturbations often cause unusual activation patterns
  • Useful for building robust detectors

Model Debugging

  • Check if early layers learn reasonable features
  • Verify deep layers develop class-specific detectors
  • Identify layers that need better regularization

Example Workflow

Step 1: Select Representative Samples
  • Choose correctly classified sample from each class
  • Choose misclassified samples
  • Choose borderline cases (low confidence)
Step 2: Visualize Early Layer
  • Check first conv layer
  • Verify edge detection and low-level features
  • Look for diverse, interpretable patterns
Step 3: Visualize Middle Layer
  • Examine texture and pattern learning
  • Identify discriminative mid-level features
  • Check activation sparsity
Step 4: Visualize Deep Layer
  • Analyze semantic feature detection
  • Compare activations across classes
  • Identify class-specific filters
Step 5: Cross-Reference with Grad-CAM
  • Ensure high Grad-CAM importance regions have strong activations
  • Verify semantic alignment between activation and attention
Step 6: Document Findings
  • Note which filters are most discriminative
  • Identify potential model weaknesses
  • Guide architecture or training improvements

References

Activation maps are essential for understanding hierarchical feature learning in CNNs. Combine with gradient-based methods for comprehensive model interpretability.

Build docs developers (and LLMs) love