Overview
Theestimate_layerwise_stats function provides detailed per-layer resource estimates for convolutional neural networks. It calculates activation memory, parameter footprints, and multiply-accumulate (MAC) operations for each layer, enabling you to identify computational and memory bottlenecks before deployment.
This analysis assumes FP32 precision (4 bytes per value). For other precisions, scale the byte estimates accordingly.
Function Signature
Parameters
The PyTorch model to analyze. Must have
conv1, conv2, and classifier attributes matching the SmallCNN architecture.Number of samples processed simultaneously. Affects total activation memory and MAC counts.
Input tensor dimensions as
(channels, height, width). Default matches MNIST image size.Returns
Type:pd.DataFrame
A pandas DataFrame with one row per layer, containing:
Layer name (
conv1, conv2, or classifier)Total number of elements in the layer’s output tensor for the given batch size
Memory required for weights and biases in bytes (FP32 precision)
Memory required to store output activations in bytes (FP32 precision)
Multiply-accumulate operations required to compute the layer output
Example Output
| layer | output_elements | parameter_bytes | activation_bytes | macs |
|---|---|---|---|---|
| conv1 | 401408 | 608 | 1605632 | 14528512 |
| conv2 | 200704 | 18496 | 802816 | 57802752 |
| classifier | 320 | 627200 | 1280 | 4915200 |
Understanding the metrics
Understanding the metrics
Output Elements: For conv layers, this is
batch_size × channels × height × width. For linear layers, it’s batch_size × output_features.Parameter Bytes: Calculated as (weight.numel() + bias.numel()) × 4. The factor of 4 accounts for FP32 storage (32 bits = 4 bytes per parameter).Activation Bytes: Equals output_elements × 4. This is the memory needed to store the layer’s output before the next operation.MACs: For conv layers: batch_size × out_channels × out_height × out_width × in_channels × kernel_height × kernel_width. For linear layers: batch_size × in_features × out_features.Implementation Details
The function computes spatial dimensions using this formula:- conv1: 3×3 kernel, padding=1, stride=1, followed by 2×2 max pooling
- conv2: 3×3 kernel, padding=1, stride=1, followed by 2×2 max pooling
- classifier: Fully connected layer operating on flattened feature maps
Use Cases
Identify Memory Bottlenecks
Compare Compute Distribution
Estimate Batch Size Impact
Integration with Pipeline
In the main pipeline (scripts/run_pipeline.py:82), layer-wise analysis feeds into hardware summaries:
outputs/layerwise_breakdown.csv and used to generate visualization plots showing activation memory and MAC distributions.
Related Functions
Bandwidth Utilization
Use layer-wise stats to calculate achieved memory bandwidth
Precision Tradeoffs
Compare metrics across FP32, FP16, and INT8 modes
Source Reference
Implementation:src/edge_opt/hardware.py:27-70