Overview
Theprecision_tradeoff_table function aggregates sweep results across all pruning levels to show mean performance metrics for each precision mode. This enables data-driven decisions about quantization strategies by revealing the typical accuracy vs efficiency tradeoffs.
Precision selection is one of the highest-impact optimizations for edge deployment, often yielding 2-4× speedups with <1% accuracy loss when properly calibrated.
Function Signature
Parameters
DataFrame from
run_sweep containing results for multiple pruning levels and precision modes. Must have columns:precision: String identifier (“fp32”, “fp16”, “int8”)accuracy: Float in [0, 1] representing validation accuracylatency_ms: Inference time in millisecondsmemory_mb: Model memory footprint in megabytesenergy_proxy_j: Energy estimate in joulesaccepted: Boolean indicating whether variant meets memory budget
Returns
Type:pd.DataFrame
A DataFrame with one row per precision mode, sorted by ascending latency:
Precision identifier: “fp32”, “fp16”, or “int8”
Mean validation accuracy across all pruning levels for this precision
Mean inference latency in milliseconds
Mean model memory footprint in megabytes
Mean energy consumption estimate in joules
Fraction of variants that passed the active memory budget constraint (range: 0.0 to 1.0)
Example Usage
| precision | accuracy_mean | latency_ms_mean | memory_mb_mean | energy_proxy_j_mean | accepted_ratio |
|---|---|---|---|---|---|
| int8 | 0.9512 | 2.34 | 0.41 | 0.0047 | 1.00 |
| fp16 | 0.9586 | 3.87 | 0.82 | 0.0077 | 1.00 |
| fp32 | 0.9591 | 7.45 | 1.64 | 0.0149 | 0.75 |
Interpreting Results
Typical Tradeoff Patterns
INT8: Maximum efficiency, slight accuracy loss
INT8: Maximum efficiency, slight accuracy loss
Characteristics:
- 2-4× faster than FP32
- 4× smaller memory footprint
- 0.5-2% accuracy degradation typical
- Highest
accepted_ratiodue to low memory
- Severe resource constraints (MCUs, low-power edge devices)
- Latency-critical applications
- Models with redundant representational capacity
calibration_batches samples for this.FP16: Balanced performance
FP16: Balanced performance
Characteristics:
- 1.5-2× faster than FP32
- 2× smaller memory footprint
- <0.5% accuracy impact in most cases
- Good
accepted_ratiofor moderate budgets
- Raspberry Pi, Jetson Nano class devices
- Models where accuracy is critical
- Gradual optimization from baseline
FP32: Baseline precision
FP32: Baseline precision
Characteristics:
- Highest accuracy (training precision)
- Largest memory and latency
- Lower
accepted_ratiowith tight budgets
- Development and accuracy validation
- Servers or cloud deployments
- Models that fail to calibrate properly
Implementation Details
The function groups sweep results by precision and aggregates key metrics:Aggregation behavior:
- Uses
.mean()for continuous metrics (accuracy, latency, memory, energy) - Uses
.mean()on booleanacceptedto compute acceptance ratio - Sorts by
latency_ms_meanascending (fastest first) - Resets index to provide clean 0-based row numbers
Advanced Analysis
Accuracy-Efficiency Tradeoff Curves
Budget-Constrained Selection
Pipeline Integration
Inscripts/run_pipeline.py:88, precision tradeoffs are computed and saved:
outputs/precision_tradeoffs.csv alongside:
layerwise_breakdown.csv(per-layer analysis)hardware_summary.csv(bandwidth metrics)- Visualization plots for layer-wise memory and compute
Combining with Pruning
Precision and pruning are orthogonal optimizations. Analyze their interaction:Limitations
Important considerations:
- Hardware dependency: Actual speedups vary by device. Some CPUs lack INT8 instructions, negating latency benefits.
- Calibration quality: INT8 accuracy depends heavily on representative calibration data.
- Framework support: ONNXRuntime INT8 performance differs from native PyTorch or TensorRT implementations.
- Batch size effects: Quantization overhead is amortized over larger batches, affecting single-sample latency differently.
Related Functions
Run Sweep
Generate the input DataFrame with multiple precision variants
Quantization
Implementation details for FP16 and INT8 conversion
Source Reference
Implementation:src/edge_opt/hardware.py:94-102