Precision Tradeoffs

Overview

The precision_tradeoff_table function aggregates sweep results across all pruning levels to show mean performance metrics for each precision mode. This enables data-driven decisions about quantization strategies by revealing the typical accuracy vs efficiency tradeoffs.

Precision selection is one of the highest-impact optimizations for edge deployment, often yielding 2-4× speedups with <1% accuracy loss when properly calibrated.

Function Signature

from edge_opt.hardware import precision_tradeoff_table

precision_df = precision_tradeoff_table(sweep_df)

Parameters

sweep_df

pd.DataFrame

required

DataFrame from run_sweep containing results for multiple pruning levels and precision modes. Must have columns:

precision: String identifier (“fp32”, “fp16”, “int8”)
accuracy: Float in [0, 1] representing validation accuracy
latency_ms: Inference time in milliseconds
memory_mb: Model memory footprint in megabytes
energy_proxy_j: Energy estimate in joules
accepted: Boolean indicating whether variant meets memory budget

Returns

Type: pd.DataFrame A DataFrame with one row per precision mode, sorted by ascending latency:

precision

str

Precision identifier: “fp32”, “fp16”, or “int8”

accuracy_mean

float

Mean validation accuracy across all pruning levels for this precision

latency_ms_mean

float

Mean inference latency in milliseconds

memory_mb_mean

float

Mean model memory footprint in megabytes

energy_proxy_j_mean

float

Mean energy consumption estimate in joules

accepted_ratio

float

Fraction of variants that passed the active memory budget constraint (range: 0.0 to 1.0)

Example Usage

import pandas as pd
from edge_opt.experiments import run_sweep
from edge_opt.hardware import precision_tradeoff_table
from edge_opt.model import SmallCNN
import torch

# Run full optimization sweep
baseline_model = SmallCNN()
sweep_df = run_sweep(
    base_model=baseline_model,
    val_loader=val_loader,
    calibration_loader=train_loader,
    device=torch.device("cpu"),
    pruning_levels=[0.0, 0.25, 0.5, 0.7],
    precisions=["fp32", "fp16", "int8"],
    power_watts=2.0,
    calibration_batches=10,
    memory_budgets_mb=[1.0, 2.0, 4.0],
    active_memory_budget_mb=2.0,
    latency_multiplier=1.0,
    benchmark_repeats=5
)

# Aggregate by precision
precision_df = precision_tradeoff_table(sweep_df)
print(precision_df)

Sample output:

precision	accuracy_mean	latency_ms_mean	memory_mb_mean	energy_proxy_j_mean	accepted_ratio
int8	0.9512	2.34	0.41	0.0047	1.00
fp16	0.9586	3.87	0.82	0.0077	1.00
fp32	0.9591	7.45	1.64	0.0149	0.75

Interpreting Results

Typical Tradeoff Patterns

INT8: Maximum efficiency, slight accuracy loss

Characteristics:

2-4× faster than FP32
4× smaller memory footprint
0.5-2% accuracy degradation typical
Highest accepted_ratio due to low memory

Best for:

Severe resource constraints (MCUs, low-power edge devices)
Latency-critical applications
Models with redundant representational capacity

Calibration required: INT8 quantization needs representative data for computing activation ranges. The pipeline uses calibration_batches samples for this.

FP16: Balanced performance

Characteristics:

1.5-2× faster than FP32
2× smaller memory footprint
<0.5% accuracy impact in most cases
Good accepted_ratio for moderate budgets

Best for:

Raspberry Pi, Jetson Nano class devices
Models where accuracy is critical
Gradual optimization from baseline

Note: Actual speedup depends on hardware SIMD support (NEON, AVX2, etc.)

FP32: Baseline precision

Characteristics:

Highest accuracy (training precision)
Largest memory and latency
Lower accepted_ratio with tight budgets

Best for:

Development and accuracy validation
Servers or cloud deployments
Models that fail to calibrate properly

Use as reference: Always include FP32 in sweeps to quantify optimization gains.

Implementation Details

The function groups sweep results by precision and aggregates key metrics:

# From src/edge_opt/hardware.py:94-102
grouped = sweep_df.groupby("precision", as_index=False).agg(
    accuracy_mean=("accuracy", "mean"),
    latency_ms_mean=("latency_ms", "mean"),
    memory_mb_mean=("memory_mb", "mean"),
    energy_proxy_j_mean=("energy_proxy_j", "mean"),
    accepted_ratio=("accepted", "mean"),
)
return grouped.sort_values("latency_ms_mean").reset_index(drop=True)

Aggregation behavior:

Uses .mean() for continuous metrics (accuracy, latency, memory, energy)
Uses .mean() on boolean accepted to compute acceptance ratio
Sorts by latency_ms_mean ascending (fastest first)
Resets index to provide clean 0-based row numbers

Advanced Analysis

Accuracy-Efficiency Tradeoff Curves

import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# Latency tradeoff
axes[0].scatter(precision_df['latency_ms_mean'], precision_df['accuracy_mean'], s=100)
for _, row in precision_df.iterrows():
    axes[0].annotate(row['precision'], 
                     (row['latency_ms_mean'], row['accuracy_mean']),
                     textcoords="offset points", xytext=(0,10), ha='center')
axes[0].set_xlabel('Latency (ms)')
axes[0].set_ylabel('Accuracy')
axes[0].set_title('Accuracy vs Latency')

# Memory tradeoff
axes[1].scatter(precision_df['memory_mb_mean'], precision_df['accuracy_mean'], s=100, color='orange')
for _, row in precision_df.iterrows():
    axes[1].annotate(row['precision'], 
                     (row['memory_mb_mean'], row['accuracy_mean']),
                     textcoords="offset points", xytext=(0,10), ha='center')
axes[1].set_xlabel('Memory (MB)')
axes[1].set_ylabel('Accuracy')
axes[1].set_title('Accuracy vs Memory')

# Energy tradeoff
axes[2].scatter(precision_df['energy_proxy_j_mean'], precision_df['accuracy_mean'], s=100, color='green')
for _, row in precision_df.iterrows():
    axes[2].annotate(row['precision'], 
                     (row['energy_proxy_j_mean'], row['accuracy_mean']),
                     textcoords="offset points", xytext=(0,10), ha='center')
axes[2].set_xlabel('Energy (J)')
axes[2].set_ylabel('Accuracy')
axes[2].set_title('Accuracy vs Energy')

plt.tight_layout()
plt.savefig('precision_tradeoffs.png', dpi=180)
plt.show()

Budget-Constrained Selection

# Find best precision mode that meets all constraints
latency_budget_ms = 5.0
memory_budget_mb = 1.0
min_accuracy = 0.95

candidates = precision_df[
    (precision_df['latency_ms_mean'] <= latency_budget_ms) &
    (precision_df['memory_mb_mean'] <= memory_budget_mb) &
    (precision_df['accuracy_mean'] >= min_accuracy)
].sort_values('accuracy_mean', ascending=False)

if not candidates.empty:
    best = candidates.iloc[0]
    print(f"Recommended precision: {best['precision']}")
    print(f"  Accuracy: {best['accuracy_mean']:.4f}")
    print(f"  Latency: {best['latency_ms_mean']:.2f} ms")
    print(f"  Memory: {best['memory_mb_mean']:.2f} MB")
else:
    print("No precision mode meets all constraints")

Pipeline Integration

In scripts/run_pipeline.py:88, precision tradeoffs are computed and saved:

precision_df = precision_tradeoff_table(sweep_df)
save_hardware_artifacts(output_dir, layerwise_df, precision_df, hardware_summary)

The resulting table is saved to outputs/precision_tradeoffs.csv alongside:

layerwise_breakdown.csv (per-layer analysis)
hardware_summary.csv (bandwidth metrics)
Visualization plots for layer-wise memory and compute

Combining with Pruning

Precision and pruning are orthogonal optimizations. Analyze their interaction:

# Create joint tradeoff table
joint_df = sweep_df.groupby(['pruning_level', 'precision'], as_index=False).agg(
    accuracy_mean=('accuracy', 'mean'),
    latency_ms_mean=('latency_ms', 'mean'),
    memory_mb_mean=('memory_mb', 'mean'),
)

# Pivot for easy comparison
pivot = joint_df.pivot(index='pruning_level', 
                       columns='precision', 
                       values='latency_ms_mean')
print("\nLatency (ms) by pruning and precision:")
print(pivot.round(2))

Sample output:

Latency (ms) by pruning and precision:
precision       fp32   fp16   int8
pruning_level                     
0.00            7.45   3.87   2.34
0.25            6.12   3.21   1.92
0.50            4.89   2.58   1.53
0.70            3.21   1.76   1.08

High pruning levels (>0.7) combined with INT8 can cause accuracy collapse if the model loses critical features. Always validate on held-out data.

Limitations

Important considerations:

Hardware dependency: Actual speedups vary by device. Some CPUs lack INT8 instructions, negating latency benefits.
Calibration quality: INT8 accuracy depends heavily on representative calibration data.
Framework support: ONNXRuntime INT8 performance differs from native PyTorch or TensorRT implementations.
Batch size effects: Quantization overhead is amortized over larger batches, affecting single-sample latency differently.

For production deployment, always profile on the target device with realistic inference patterns.

Run Sweep

Generate the input DataFrame with multiple precision variants

Quantization

Implementation details for FP16 and INT8 conversion

Source Reference

Implementation: src/edge_opt/hardware.py:94-102

Get Started

Core Concepts

Guides

Hardware Analysis

Deployment

Overview

Function Signature

Parameters

Returns

Example Usage

Interpreting Results

Typical Tradeoff Patterns

Implementation Details

Advanced Analysis

Accuracy-Efficiency Tradeoff Curves

Budget-Constrained Selection

Pipeline Integration

Combining with Pruning

Limitations

Run Sweep

Quantization

Source Reference

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Hardware Analysis

Deployment

​Overview

​Function Signature

​Parameters

​Returns

​Example Usage

​Interpreting Results

​Typical Tradeoff Patterns

​Implementation Details

​Advanced Analysis

​Accuracy-Efficiency Tradeoff Curves

​Budget-Constrained Selection

​Pipeline Integration

​Combining with Pruning

​Limitations

​Related Functions

Run Sweep

Quantization

​Source Reference

Build docs developers (and LLMs) love

Overview

Function Signature

Parameters

Returns

Example Usage

Interpreting Results

Typical Tradeoff Patterns

Implementation Details

Advanced Analysis

Accuracy-Efficiency Tradeoff Curves

Budget-Constrained Selection

Pipeline Integration

Combining with Pruning

Limitations

Related Functions

Source Reference