Overview
The hardware module provides layer-wise computational analysis, memory bandwidth utilization, precision tradeoff evaluation, and visualization tools for understanding model performance on edge devices.Data Classes
LayerEstimate
Dataclass representing computational estimates for a single layer.Layer name identifier
Number of output elements produced by the layer
Total bytes used by layer parameters (weights and biases)
Total bytes used by layer activations
Multiply-accumulate operations (computational cost)
Core Functions
estimate_layerwise_stats
Computes layer-by-layer computational and memory statistics for a CNN model.PyTorch model with
conv1, conv2, and classifier attributes (CNN architecture)Batch size for computational estimates
Input tensor shape as (channels, height, width)
DataFrame with columns:
layer, output_elements, parameter_bytes, activation_bytes, macsThis function assumes a CNN architecture with two convolutional layers followed by max pooling and a linear classifier. It calculates output shapes based on 3×3 kernels with padding=1 and stride=1, and 2×2 max pooling.
summarize_hardware
Generates hardware utilization metrics including bandwidth and compute efficiency.DataFrame from
estimate_layerwise_stats containing layer statisticsMeasured inference latency in milliseconds
Hardware memory bandwidth specification in gigabytes per second
Dictionary containing:
estimated_total_bytes: Total parameter and activation bytesestimated_total_macs: Total multiply-accumulate operationsachieved_bandwidth_gbps: Actual bandwidth utilization during inferenceconfigured_memory_bandwidth_gbps: Hardware bandwidth specificationbandwidth_utilization: Ratio of achieved to configured bandwidthachieved_gmacs: Achieved GMAC/s (billions of MACs per second)
Bandwidth utilization below 100% may indicate compute-bound operations, while values approaching 100% suggest memory-bound performance.
precision_tradeoff_table
Aggregates sweep results to compare precision configurations.DataFrame containing sweep results with columns:
precision, accuracy, latency_ms, memory_mb, energy_proxy_j, acceptedAggregated DataFrame grouped by precision with columns:
precision: Precision mode (e.g., “fp32”, “fp16”)accuracy_mean: Mean accuracy across configurationslatency_ms_mean: Mean latency in millisecondsmemory_mb_mean: Mean memory footprint in MBenergy_proxy_j_mean: Mean energy consumption in joulesaccepted_ratio: Fraction of configurations meeting constraints
latency_ms_mean (ascending)save_hardware_artifacts
Saves hardware analysis results and generates visualization plots.Directory path to save artifacts (created if it doesn’t exist)
Layer-wise statistics from
estimate_layerwise_statsPrecision tradeoff table from
precision_tradeoff_tableHardware summary from
summarize_hardwareThis function creates the following files:
layerwise_breakdown.csv: Layer-by-layer statisticsprecision_tradeoffs.csv: Precision comparison tablehardware_summary.csv: Overall hardware metricslayerwise_activation_memory.png: Bar chart of activation memory by layerlayerwise_macs.png: Bar chart of computational cost by layer
Visualization plots are saved at 180 DPI resolution with tight layout formatting. All CSV files use UTF-8 encoding.