Layer-wise Analysis

Overview

The estimate_layerwise_stats function provides detailed per-layer resource estimates for convolutional neural networks. It calculates activation memory, parameter footprints, and multiply-accumulate (MAC) operations for each layer, enabling you to identify computational and memory bottlenecks before deployment.

This analysis assumes FP32 precision (4 bytes per value). For other precisions, scale the byte estimates accordingly.

Function Signature

from edge_opt.hardware import estimate_layerwise_stats

layerwise_df = estimate_layerwise_stats(
    model=model,
    batch_size=32,
    input_shape=(1, 28, 28)
)

Parameters

model

nn.Module

required

The PyTorch model to analyze. Must have conv1, conv2, and classifier attributes matching the SmallCNN architecture.

batch_size

int

required

Number of samples processed simultaneously. Affects total activation memory and MAC counts.

input_shape

tuple[int, int, int]

default:"(1, 28, 28)"

Input tensor dimensions as (channels, height, width). Default matches MNIST image size.

Returns

Type: pd.DataFrame A pandas DataFrame with one row per layer, containing:

layer

str

Layer name (conv1, conv2, or classifier)

output_elements

int

Total number of elements in the layer’s output tensor for the given batch size

parameter_bytes

int

Memory required for weights and biases in bytes (FP32 precision)

activation_bytes

int

Memory required to store output activations in bytes (FP32 precision)

macs

int

Multiply-accumulate operations required to compute the layer output

Example Output

import torch
from edge_opt.model import SmallCNN
from edge_opt.hardware import estimate_layerwise_stats

model = SmallCNN(conv1_channels=16, conv2_channels=32)
layerwise_df = estimate_layerwise_stats(model, batch_size=32)
print(layerwise_df)

Sample output:

layer	output_elements	parameter_bytes	activation_bytes	macs
conv1	401408	608	1605632	14528512
conv2	200704	18496	802816	57802752
classifier	320	627200	1280	4915200

Understanding the metrics

Output Elements: For conv layers, this is batch_size × channels × height × width. For linear layers, it’s batch_size × output_features.Parameter Bytes: Calculated as (weight.numel() + bias.numel()) × 4. The factor of 4 accounts for FP32 storage (32 bits = 4 bytes per parameter).Activation Bytes: Equals output_elements × 4. This is the memory needed to store the layer’s output before the next operation.MACs: For conv layers: batch_size × out_channels × out_height × out_width × in_channels × kernel_height × kernel_width. For linear layers: batch_size × in_features × out_features.

Implementation Details

The function computes spatial dimensions using this formula:

def _conv2d_output_shape(height, width, kernel, padding, stride=1):
    out_h = (height + (2 * padding) - kernel) // stride + 1
    out_w = (width + (2 * padding) - kernel) // stride + 1
    return out_h, out_w

For the default SmallCNN architecture:

conv1: 3×3 kernel, padding=1, stride=1, followed by 2×2 max pooling
conv2: 3×3 kernel, padding=1, stride=1, followed by 2×2 max pooling
classifier: Fully connected layer operating on flattened feature maps

This function is hardcoded for the SmallCNN architecture. For custom models, you’ll need to implement your own layer-wise analysis or extend this function.

Use Cases

Identify Memory Bottlenecks

# Find layers with highest activation memory
top_activation_layers = layerwise_df.nlargest(3, 'activation_bytes')
print("Top memory consumers:")
print(top_activation_layers[['layer', 'activation_bytes']])

Compare Compute Distribution

# Calculate percentage of total MACs per layer
layerwise_df['mac_percentage'] = (
    layerwise_df['macs'] / layerwise_df['macs'].sum() * 100
)
print(layerwise_df[['layer', 'mac_percentage']])

Estimate Batch Size Impact

import matplotlib.pyplot as plt

batch_sizes = [1, 8, 16, 32, 64]
total_memory = []

for bs in batch_sizes:
    df = estimate_layerwise_stats(model, batch_size=bs)
    total_mb = df['activation_bytes'].sum() / (1024**2)
    total_memory.append(total_mb)

plt.plot(batch_sizes, total_memory, marker='o')
plt.xlabel('Batch Size')
plt.ylabel('Total Activation Memory (MB)')
plt.title('Memory Scaling with Batch Size')
plt.show()

Integration with Pipeline

In the main pipeline (scripts/run_pipeline.py:82), layer-wise analysis feeds into hardware summaries:

layerwise_df = estimate_layerwise_stats(baseline_model, batch_size=cfg.batch_size)
hardware_summary = summarize_hardware(
    layerwise_df,
    latency_ms=baseline_metrics.latency_ms,
    memory_bandwidth_gbps=cfg.memory_bandwidth_gbps,
)

The results are saved to outputs/layerwise_breakdown.csv and used to generate visualization plots showing activation memory and MAC distributions.

Bandwidth Utilization

Use layer-wise stats to calculate achieved memory bandwidth

Precision Tradeoffs

Compare metrics across FP32, FP16, and INT8 modes

Source Reference

Implementation: src/edge_opt/hardware.py:27-70

Get Started

Core Concepts

Guides

Hardware Analysis

Deployment

Overview

Function Signature

Parameters

Returns

Example Output

Implementation Details

Use Cases

Identify Memory Bottlenecks

Compare Compute Distribution

Estimate Batch Size Impact

Integration with Pipeline

Bandwidth Utilization

Precision Tradeoffs

Source Reference

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Hardware Analysis

Deployment

​Overview

​Function Signature

​Parameters

​Returns

​Example Output

​Implementation Details

​Use Cases

​Identify Memory Bottlenecks

​Compare Compute Distribution

​Estimate Batch Size Impact

​Integration with Pipeline

​Related Functions

Bandwidth Utilization

Precision Tradeoffs

​Source Reference

Build docs developers (and LLMs) love

Overview

Function Signature

Parameters

Returns

Example Output

Implementation Details

Use Cases

Identify Memory Bottlenecks

Compare Compute Distribution

Estimate Batch Size Impact

Integration with Pipeline

Related Functions

Source Reference