Skip to main content

Overview

The estimate_layerwise_stats function provides detailed per-layer resource estimates for convolutional neural networks. It calculates activation memory, parameter footprints, and multiply-accumulate (MAC) operations for each layer, enabling you to identify computational and memory bottlenecks before deployment.
This analysis assumes FP32 precision (4 bytes per value). For other precisions, scale the byte estimates accordingly.

Function Signature

from edge_opt.hardware import estimate_layerwise_stats

layerwise_df = estimate_layerwise_stats(
    model=model,
    batch_size=32,
    input_shape=(1, 28, 28)
)

Parameters

model
nn.Module
required
The PyTorch model to analyze. Must have conv1, conv2, and classifier attributes matching the SmallCNN architecture.
batch_size
int
required
Number of samples processed simultaneously. Affects total activation memory and MAC counts.
input_shape
tuple[int, int, int]
default:"(1, 28, 28)"
Input tensor dimensions as (channels, height, width). Default matches MNIST image size.

Returns

Type: pd.DataFrame A pandas DataFrame with one row per layer, containing:
layer
str
Layer name (conv1, conv2, or classifier)
output_elements
int
Total number of elements in the layer’s output tensor for the given batch size
parameter_bytes
int
Memory required for weights and biases in bytes (FP32 precision)
activation_bytes
int
Memory required to store output activations in bytes (FP32 precision)
macs
int
Multiply-accumulate operations required to compute the layer output

Example Output

import torch
from edge_opt.model import SmallCNN
from edge_opt.hardware import estimate_layerwise_stats

model = SmallCNN(conv1_channels=16, conv2_channels=32)
layerwise_df = estimate_layerwise_stats(model, batch_size=32)
print(layerwise_df)
Sample output:
layeroutput_elementsparameter_bytesactivation_bytesmacs
conv1401408608160563214528512
conv22007041849680281657802752
classifier32062720012804915200
Output Elements: For conv layers, this is batch_size × channels × height × width. For linear layers, it’s batch_size × output_features.Parameter Bytes: Calculated as (weight.numel() + bias.numel()) × 4. The factor of 4 accounts for FP32 storage (32 bits = 4 bytes per parameter).Activation Bytes: Equals output_elements × 4. This is the memory needed to store the layer’s output before the next operation.MACs: For conv layers: batch_size × out_channels × out_height × out_width × in_channels × kernel_height × kernel_width. For linear layers: batch_size × in_features × out_features.

Implementation Details

The function computes spatial dimensions using this formula:
def _conv2d_output_shape(height, width, kernel, padding, stride=1):
    out_h = (height + (2 * padding) - kernel) // stride + 1
    out_w = (width + (2 * padding) - kernel) // stride + 1
    return out_h, out_w
For the default SmallCNN architecture:
  1. conv1: 3×3 kernel, padding=1, stride=1, followed by 2×2 max pooling
  2. conv2: 3×3 kernel, padding=1, stride=1, followed by 2×2 max pooling
  3. classifier: Fully connected layer operating on flattened feature maps
This function is hardcoded for the SmallCNN architecture. For custom models, you’ll need to implement your own layer-wise analysis or extend this function.

Use Cases

Identify Memory Bottlenecks

# Find layers with highest activation memory
top_activation_layers = layerwise_df.nlargest(3, 'activation_bytes')
print("Top memory consumers:")
print(top_activation_layers[['layer', 'activation_bytes']])

Compare Compute Distribution

# Calculate percentage of total MACs per layer
layerwise_df['mac_percentage'] = (
    layerwise_df['macs'] / layerwise_df['macs'].sum() * 100
)
print(layerwise_df[['layer', 'mac_percentage']])

Estimate Batch Size Impact

import matplotlib.pyplot as plt

batch_sizes = [1, 8, 16, 32, 64]
total_memory = []

for bs in batch_sizes:
    df = estimate_layerwise_stats(model, batch_size=bs)
    total_mb = df['activation_bytes'].sum() / (1024**2)
    total_memory.append(total_mb)

plt.plot(batch_sizes, total_memory, marker='o')
plt.xlabel('Batch Size')
plt.ylabel('Total Activation Memory (MB)')
plt.title('Memory Scaling with Batch Size')
plt.show()

Integration with Pipeline

In the main pipeline (scripts/run_pipeline.py:82), layer-wise analysis feeds into hardware summaries:
layerwise_df = estimate_layerwise_stats(baseline_model, batch_size=cfg.batch_size)
hardware_summary = summarize_hardware(
    layerwise_df,
    latency_ms=baseline_metrics.latency_ms,
    memory_bandwidth_gbps=cfg.memory_bandwidth_gbps,
)
The results are saved to outputs/layerwise_breakdown.csv and used to generate visualization plots showing activation memory and MAC distributions.

Bandwidth Utilization

Use layer-wise stats to calculate achieved memory bandwidth

Precision Tradeoffs

Compare metrics across FP32, FP16, and INT8 modes

Source Reference

Implementation: src/edge_opt/hardware.py:27-70

Build docs developers (and LLMs) love