Skip to main content

Overview

The precision_tradeoff_table function aggregates sweep results across all pruning levels to show mean performance metrics for each precision mode. This enables data-driven decisions about quantization strategies by revealing the typical accuracy vs efficiency tradeoffs.
Precision selection is one of the highest-impact optimizations for edge deployment, often yielding 2-4× speedups with <1% accuracy loss when properly calibrated.

Function Signature

from edge_opt.hardware import precision_tradeoff_table

precision_df = precision_tradeoff_table(sweep_df)

Parameters

sweep_df
pd.DataFrame
required
DataFrame from run_sweep containing results for multiple pruning levels and precision modes. Must have columns:
  • precision: String identifier (“fp32”, “fp16”, “int8”)
  • accuracy: Float in [0, 1] representing validation accuracy
  • latency_ms: Inference time in milliseconds
  • memory_mb: Model memory footprint in megabytes
  • energy_proxy_j: Energy estimate in joules
  • accepted: Boolean indicating whether variant meets memory budget

Returns

Type: pd.DataFrame A DataFrame with one row per precision mode, sorted by ascending latency:
precision
str
Precision identifier: “fp32”, “fp16”, or “int8”
accuracy_mean
float
Mean validation accuracy across all pruning levels for this precision
latency_ms_mean
float
Mean inference latency in milliseconds
memory_mb_mean
float
Mean model memory footprint in megabytes
energy_proxy_j_mean
float
Mean energy consumption estimate in joules
accepted_ratio
float
Fraction of variants that passed the active memory budget constraint (range: 0.0 to 1.0)

Example Usage

import pandas as pd
from edge_opt.experiments import run_sweep
from edge_opt.hardware import precision_tradeoff_table
from edge_opt.model import SmallCNN
import torch

# Run full optimization sweep
baseline_model = SmallCNN()
sweep_df = run_sweep(
    base_model=baseline_model,
    val_loader=val_loader,
    calibration_loader=train_loader,
    device=torch.device("cpu"),
    pruning_levels=[0.0, 0.25, 0.5, 0.7],
    precisions=["fp32", "fp16", "int8"],
    power_watts=2.0,
    calibration_batches=10,
    memory_budgets_mb=[1.0, 2.0, 4.0],
    active_memory_budget_mb=2.0,
    latency_multiplier=1.0,
    benchmark_repeats=5
)

# Aggregate by precision
precision_df = precision_tradeoff_table(sweep_df)
print(precision_df)
Sample output:
precisionaccuracy_meanlatency_ms_meanmemory_mb_meanenergy_proxy_j_meanaccepted_ratio
int80.95122.340.410.00471.00
fp160.95863.870.820.00771.00
fp320.95917.451.640.01490.75

Interpreting Results

Typical Tradeoff Patterns

Characteristics:
  • 2-4× faster than FP32
  • 4× smaller memory footprint
  • 0.5-2% accuracy degradation typical
  • Highest accepted_ratio due to low memory
Best for:
  • Severe resource constraints (MCUs, low-power edge devices)
  • Latency-critical applications
  • Models with redundant representational capacity
Calibration required: INT8 quantization needs representative data for computing activation ranges. The pipeline uses calibration_batches samples for this.
Characteristics:
  • 1.5-2× faster than FP32
  • 2× smaller memory footprint
  • <0.5% accuracy impact in most cases
  • Good accepted_ratio for moderate budgets
Best for:
  • Raspberry Pi, Jetson Nano class devices
  • Models where accuracy is critical
  • Gradual optimization from baseline
Note: Actual speedup depends on hardware SIMD support (NEON, AVX2, etc.)
Characteristics:
  • Highest accuracy (training precision)
  • Largest memory and latency
  • Lower accepted_ratio with tight budgets
Best for:
  • Development and accuracy validation
  • Servers or cloud deployments
  • Models that fail to calibrate properly
Use as reference: Always include FP32 in sweeps to quantify optimization gains.

Implementation Details

The function groups sweep results by precision and aggregates key metrics:
# From src/edge_opt/hardware.py:94-102
grouped = sweep_df.groupby("precision", as_index=False).agg(
    accuracy_mean=("accuracy", "mean"),
    latency_ms_mean=("latency_ms", "mean"),
    memory_mb_mean=("memory_mb", "mean"),
    energy_proxy_j_mean=("energy_proxy_j", "mean"),
    accepted_ratio=("accepted", "mean"),
)
return grouped.sort_values("latency_ms_mean").reset_index(drop=True)
Aggregation behavior:
  • Uses .mean() for continuous metrics (accuracy, latency, memory, energy)
  • Uses .mean() on boolean accepted to compute acceptance ratio
  • Sorts by latency_ms_mean ascending (fastest first)
  • Resets index to provide clean 0-based row numbers

Advanced Analysis

Accuracy-Efficiency Tradeoff Curves

import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# Latency tradeoff
axes[0].scatter(precision_df['latency_ms_mean'], precision_df['accuracy_mean'], s=100)
for _, row in precision_df.iterrows():
    axes[0].annotate(row['precision'], 
                     (row['latency_ms_mean'], row['accuracy_mean']),
                     textcoords="offset points", xytext=(0,10), ha='center')
axes[0].set_xlabel('Latency (ms)')
axes[0].set_ylabel('Accuracy')
axes[0].set_title('Accuracy vs Latency')

# Memory tradeoff
axes[1].scatter(precision_df['memory_mb_mean'], precision_df['accuracy_mean'], s=100, color='orange')
for _, row in precision_df.iterrows():
    axes[1].annotate(row['precision'], 
                     (row['memory_mb_mean'], row['accuracy_mean']),
                     textcoords="offset points", xytext=(0,10), ha='center')
axes[1].set_xlabel('Memory (MB)')
axes[1].set_ylabel('Accuracy')
axes[1].set_title('Accuracy vs Memory')

# Energy tradeoff
axes[2].scatter(precision_df['energy_proxy_j_mean'], precision_df['accuracy_mean'], s=100, color='green')
for _, row in precision_df.iterrows():
    axes[2].annotate(row['precision'], 
                     (row['energy_proxy_j_mean'], row['accuracy_mean']),
                     textcoords="offset points", xytext=(0,10), ha='center')
axes[2].set_xlabel('Energy (J)')
axes[2].set_ylabel('Accuracy')
axes[2].set_title('Accuracy vs Energy')

plt.tight_layout()
plt.savefig('precision_tradeoffs.png', dpi=180)
plt.show()

Budget-Constrained Selection

# Find best precision mode that meets all constraints
latency_budget_ms = 5.0
memory_budget_mb = 1.0
min_accuracy = 0.95

candidates = precision_df[
    (precision_df['latency_ms_mean'] <= latency_budget_ms) &
    (precision_df['memory_mb_mean'] <= memory_budget_mb) &
    (precision_df['accuracy_mean'] >= min_accuracy)
].sort_values('accuracy_mean', ascending=False)

if not candidates.empty:
    best = candidates.iloc[0]
    print(f"Recommended precision: {best['precision']}")
    print(f"  Accuracy: {best['accuracy_mean']:.4f}")
    print(f"  Latency: {best['latency_ms_mean']:.2f} ms")
    print(f"  Memory: {best['memory_mb_mean']:.2f} MB")
else:
    print("No precision mode meets all constraints")

Pipeline Integration

In scripts/run_pipeline.py:88, precision tradeoffs are computed and saved:
precision_df = precision_tradeoff_table(sweep_df)
save_hardware_artifacts(output_dir, layerwise_df, precision_df, hardware_summary)
The resulting table is saved to outputs/precision_tradeoffs.csv alongside:
  • layerwise_breakdown.csv (per-layer analysis)
  • hardware_summary.csv (bandwidth metrics)
  • Visualization plots for layer-wise memory and compute

Combining with Pruning

Precision and pruning are orthogonal optimizations. Analyze their interaction:
# Create joint tradeoff table
joint_df = sweep_df.groupby(['pruning_level', 'precision'], as_index=False).agg(
    accuracy_mean=('accuracy', 'mean'),
    latency_ms_mean=('latency_ms', 'mean'),
    memory_mb_mean=('memory_mb', 'mean'),
)

# Pivot for easy comparison
pivot = joint_df.pivot(index='pruning_level', 
                       columns='precision', 
                       values='latency_ms_mean')
print("\nLatency (ms) by pruning and precision:")
print(pivot.round(2))
Sample output:
Latency (ms) by pruning and precision:
precision       fp32   fp16   int8
pruning_level                     
0.00            7.45   3.87   2.34
0.25            6.12   3.21   1.92
0.50            4.89   2.58   1.53
0.70            3.21   1.76   1.08
High pruning levels (>0.7) combined with INT8 can cause accuracy collapse if the model loses critical features. Always validate on held-out data.

Limitations

Important considerations:
  • Hardware dependency: Actual speedups vary by device. Some CPUs lack INT8 instructions, negating latency benefits.
  • Calibration quality: INT8 accuracy depends heavily on representative calibration data.
  • Framework support: ONNXRuntime INT8 performance differs from native PyTorch or TensorRT implementations.
  • Batch size effects: Quantization overhead is amortized over larger batches, affecting single-sample latency differently.
For production deployment, always profile on the target device with realistic inference patterns.

Run Sweep

Generate the input DataFrame with multiple precision variants

Quantization

Implementation details for FP16 and INT8 conversion

Source Reference

Implementation: src/edge_opt/hardware.py:94-102

Build docs developers (and LLMs) love