Overview
The hardware profiling module estimates computational costs, memory requirements, and resource utilization for model inference. It provides layer-by-layer operator analysis and quantization tradeoffs to optimize deployment on resource-constrained hardware.Core Functions
build_hardware_profile_table
evaluation/hardware_profile.py:18
write_hardware_profile_artifacts
evaluation/hardware_profile.py:48
Usage Example
Profile Structure
The hardware profile contains four main sections:1. operator_profile
List of operator-level metrics. Each operator has:- operator: Operation name (e.g.,
input_normalization,linear_projection) - latency_ms: Estimated latency in milliseconds
- memory_kb: Memory footprint in kilobytes
2. totals
Aggregated metrics across all operators:- latency_ms: Sum of all operator latencies
- memory_kb: Sum of all operator memory usage
- estimated_bandwidth_mb_s: Memory bandwidth estimate
- stream_utilization: Fraction of stream interval used for compute
hardware_profile.py:24):
hardware_profile.py:25):
3. precision_tradeoffs
Memory savings from quantization:- fp32_memory_kb: Full precision (baseline)
- fp16_memory_kb: Half precision (50% reduction)
- int8_memory_kb: 8-bit integer (75% reduction)
- note: Warning about deployment-specific latency effects
4. edge_constraints
Deployment considerations:- cache_sensitivity: Impact of small batch sizes on cache efficiency
- bottleneck: Description of primary performance bottleneck
Operator Cost Estimation
Layer costs are estimated using empirical formulas (hardware_profile.py:8-15):
Hardware Utilities
Theutils/hardware.py module provides helper functions:
HardwareProfile Dataclass
utils/hardware.py:6
estimate_batch_memory_mb
utils/hardware.py:12
Estimates memory usage for a batch in megabytes. Default assumes 8 bytes per feature (fp64).
auto_adjust_batch_size
utils/hardware.py:16
Automatically reduces batch size to fit memory constraints. Uses binary search (halving).
Usage example:
compute_utilization
utils/hardware.py:23
Calculates compute utilization as a fraction (0.0 to 1.0).
CSV Artifacts
Thewrite_hardware_profile_artifacts function generates two CSV files:
operator_profile.csv
hardware_totals.csv
Optimization Workflow
- Profile baseline with production batch size and feature count
- Identify bottleneck from
edge_constraints - Evaluate quantization using
precision_tradeoffs - Adjust batch size with
auto_adjust_batch_size - Monitor utilization to avoid over/under-provisioning
Quantization Recommendations
Based on memory constraints:Edge Deployment
For edge devices (Raspberry Pi, mobile, IoT):- Small batches: Process 1-4 samples at a time
- FP16 or INT8: Reduce memory footprint
- Monitor cache: Small batches reduce cache reuse
- Bandwidth awareness: Memory movement often dominates latency