Skip to main content

Overview

The hardware module provides layer-wise computational analysis, memory bandwidth utilization, precision tradeoff evaluation, and visualization tools for understanding model performance on edge devices.

Data Classes

LayerEstimate

Dataclass representing computational estimates for a single layer.
@dataclass
class LayerEstimate:
    layer: str
    output_elements: int
    parameter_bytes: int
    activation_bytes: int
    macs: int
layer
str
Layer name identifier
output_elements
int
Number of output elements produced by the layer
parameter_bytes
int
Total bytes used by layer parameters (weights and biases)
activation_bytes
int
Total bytes used by layer activations
macs
int
Multiply-accumulate operations (computational cost)

Core Functions

estimate_layerwise_stats

Computes layer-by-layer computational and memory statistics for a CNN model.
def estimate_layerwise_stats(
    model: nn.Module,
    batch_size: int,
    input_shape: tuple[int, int, int] = (1, 28, 28)
) -> pd.DataFrame
model
nn.Module
required
PyTorch model with conv1, conv2, and classifier attributes (CNN architecture)
batch_size
int
required
Batch size for computational estimates
input_shape
tuple[int, int, int]
default:"(1, 28, 28)"
Input tensor shape as (channels, height, width)
return
pd.DataFrame
DataFrame with columns: layer, output_elements, parameter_bytes, activation_bytes, macs
from edge_opt.hardware import estimate_layerwise_stats

# Analyze a CNN model
layerwise_df = estimate_layerwise_stats(
    model=my_cnn_model,
    batch_size=32,
    input_shape=(1, 28, 28)
)

print(layerwise_df)
#      layer  output_elements  parameter_bytes  activation_bytes      macs
# 0    conv1           100352           3200            401408  28901376
# 1    conv2            51200          25600            204800  14745600
# 2  classifier          320           5120              1280     40960
This function assumes a CNN architecture with two convolutional layers followed by max pooling and a linear classifier. It calculates output shapes based on 3×3 kernels with padding=1 and stride=1, and 2×2 max pooling.

summarize_hardware

Generates hardware utilization metrics including bandwidth and compute efficiency.
def summarize_hardware(
    layerwise_df: pd.DataFrame,
    latency_ms: float,
    memory_bandwidth_gbps: float,
) -> dict[str, float]
layerwise_df
pd.DataFrame
required
DataFrame from estimate_layerwise_stats containing layer statistics
latency_ms
float
required
Measured inference latency in milliseconds
memory_bandwidth_gbps
float
required
Hardware memory bandwidth specification in gigabytes per second
return
dict[str, float]
Dictionary containing:
  • estimated_total_bytes: Total parameter and activation bytes
  • estimated_total_macs: Total multiply-accumulate operations
  • achieved_bandwidth_gbps: Actual bandwidth utilization during inference
  • configured_memory_bandwidth_gbps: Hardware bandwidth specification
  • bandwidth_utilization: Ratio of achieved to configured bandwidth
  • achieved_gmacs: Achieved GMAC/s (billions of MACs per second)
from edge_opt.hardware import estimate_layerwise_stats, summarize_hardware

layerwise_df = estimate_layerwise_stats(model, batch_size=32)

hw_summary = summarize_hardware(
    layerwise_df=layerwise_df,
    latency_ms=45.2,
    memory_bandwidth_gbps=12.8
)

print(f"Bandwidth utilization: {hw_summary['bandwidth_utilization']:.2%}")
print(f"Achieved GMAC/s: {hw_summary['achieved_gmacs']:.2f}")
Bandwidth utilization below 100% may indicate compute-bound operations, while values approaching 100% suggest memory-bound performance.

precision_tradeoff_table

Aggregates sweep results to compare precision configurations.
def precision_tradeoff_table(sweep_df: pd.DataFrame) -> pd.DataFrame
sweep_df
pd.DataFrame
required
DataFrame containing sweep results with columns: precision, accuracy, latency_ms, memory_mb, energy_proxy_j, accepted
return
pd.DataFrame
Aggregated DataFrame grouped by precision with columns:
  • precision: Precision mode (e.g., “fp32”, “fp16”)
  • accuracy_mean: Mean accuracy across configurations
  • latency_ms_mean: Mean latency in milliseconds
  • memory_mb_mean: Mean memory footprint in MB
  • energy_proxy_j_mean: Mean energy consumption in joules
  • accepted_ratio: Fraction of configurations meeting constraints
Sorted by latency_ms_mean (ascending)
from edge_opt.hardware import precision_tradeoff_table

# sweep_df contains results from multiple precision configurations
tradeoff_table = precision_tradeoff_table(sweep_df)

print(tradeoff_table)
# precision  accuracy_mean  latency_ms_mean  memory_mb_mean  energy_proxy_j_mean  accepted_ratio
#     fp16          0.985             12.3            2.15                0.062             0.85
#     fp32          0.987             24.1            4.30                0.121             0.45

save_hardware_artifacts

Saves hardware analysis results and generates visualization plots.
def save_hardware_artifacts(
    output_dir: Path,
    layerwise_df: pd.DataFrame,
    precision_df: pd.DataFrame,
    summary: dict[str, float],
) -> None
output_dir
Path
required
Directory path to save artifacts (created if it doesn’t exist)
layerwise_df
pd.DataFrame
required
Layer-wise statistics from estimate_layerwise_stats
precision_df
pd.DataFrame
required
Precision tradeoff table from precision_tradeoff_table
summary
dict[str, float]
required
Hardware summary from summarize_hardware
files_created
list[str]
This function creates the following files:
  • layerwise_breakdown.csv: Layer-by-layer statistics
  • precision_tradeoffs.csv: Precision comparison table
  • hardware_summary.csv: Overall hardware metrics
  • layerwise_activation_memory.png: Bar chart of activation memory by layer
  • layerwise_macs.png: Bar chart of computational cost by layer
from pathlib import Path
from edge_opt.hardware import (
    estimate_layerwise_stats,
    summarize_hardware,
    precision_tradeoff_table,
    save_hardware_artifacts
)

# Generate all hardware analysis data
layerwise_df = estimate_layerwise_stats(model, batch_size=32)
hw_summary = summarize_hardware(layerwise_df, latency_ms=45.2, memory_bandwidth_gbps=12.8)
tradeoff_df = precision_tradeoff_table(sweep_results)

# Save all artifacts
save_hardware_artifacts(
    output_dir=Path("./hardware_analysis"),
    layerwise_df=layerwise_df,
    precision_df=tradeoff_df,
    summary=hw_summary
)
Visualization plots are saved at 180 DPI resolution with tight layout formatting. All CSV files use UTF-8 encoding.

Build docs developers (and LLMs) love