Skip to main content

Overview

The metrics module provides comprehensive performance measurement capabilities for edge AI models, including accuracy evaluation, latency benchmarking, memory profiling, and energy estimation.

Data Classes

PerfMetrics

Dataclass that encapsulates all performance metrics for a model configuration.
@dataclass
class PerfMetrics:
    accuracy: float
    latency_ms: float
    latency_std_ms: float
    latency_p95_ms: float
    throughput_sps: float
    memory_mb: float
    energy_proxy_j: float
accuracy
float
Model accuracy on the evaluation dataset (range: 0.0 to 1.0)
latency_ms
float
Mean inference latency in milliseconds
latency_std_ms
float
Standard deviation of latency measurements in milliseconds
latency_p95_ms
float
95th percentile latency in milliseconds
throughput_sps
float
Throughput in samples per second
memory_mb
float
Model memory footprint in megabytes
energy_proxy_j
float
Estimated energy consumption in joules (latency × power)

Core Functions

collect_metrics

Collects comprehensive performance metrics for a PyTorch model.
def collect_metrics(
    model: nn.Module,
    loader: DataLoader,
    device: torch.device,
    power_watts: float,
    precision: str,
    latency_multiplier: float = 1.0,
    benchmark_repeats: int = 5,
) -> PerfMetrics
model
nn.Module
required
PyTorch model to benchmark
loader
DataLoader
required
DataLoader for evaluation dataset
device
torch.device
required
Device to run inference on (CPU or CUDA)
power_watts
float
required
Power consumption of the target hardware in watts (used for energy estimation)
precision
str
required
Numeric precision for inference. Supported values: "fp32", "fp16"
latency_multiplier
float
default:"1.0"
Multiplier to adjust latency measurements for hardware differences
benchmark_repeats
int
default:"5"
Number of times to repeat latency measurements for statistical significance
return
PerfMetrics
Complete performance metrics including accuracy, latency statistics, throughput, memory, and energy
from edge_opt.metrics import collect_metrics
import torch
from torch.utils.data import DataLoader

# Collect metrics for FP32 model
metrics = collect_metrics(
    model=my_model,
    loader=test_loader,
    device=torch.device("cpu"),
    power_watts=5.0,
    precision="fp32",
    latency_multiplier=1.0,
    benchmark_repeats=10
)

print(f"Accuracy: {metrics.accuracy:.4f}")
print(f"Latency: {metrics.latency_ms:.2f} ms")
print(f"Memory: {metrics.memory_mb:.2f} MB")
The function automatically handles precision conversion for FP16 mode and includes warmup runs before latency measurement.

memory_violations

Checks if model memory footprint violates specified budget constraints.
def memory_violations(memory_mb: float, budgets_mb: list[float]) -> dict[str, bool]
memory_mb
float
required
Model memory footprint in megabytes
budgets_mb
list[float]
required
List of memory budget thresholds to check against
return
dict[str, bool]
Dictionary with keys in format "violates_{budget}mb" and boolean values indicating violations
from edge_opt.metrics import memory_violations

model_memory = 4.5  # MB
budgets = [2.0, 4.0, 8.0]

violations = memory_violations(model_memory, budgets)
# Returns: {
#   "violates_2.0mb": True,
#   "violates_4.0mb": True,
#   "violates_8.0mb": False
# }

Utility Functions

evaluate_accuracy

Evaluates model accuracy on a dataset.
def evaluate_accuracy(
    model: nn.Module,
    loader: DataLoader,
    device: torch.device,
    precision: str = "fp32"
) -> float
model
nn.Module
required
Model to evaluate
loader
DataLoader
required
DataLoader containing evaluation data
device
torch.device
required
Device for inference
precision
str
default:"fp32"
Numeric precision: "fp32" or "fp16"
return
float
Accuracy as a float between 0.0 and 1.0

measure_latency

Measures average inference latency for a single input.
def measure_latency(
    model: nn.Module,
    sample_input: torch.Tensor,
    num_runs: int = 100,
    warmup: int = 10
) -> float
model
nn.Module
required
Model to benchmark
sample_input
torch.Tensor
required
Sample input tensor for inference
num_runs
int
default:"100"
Number of inference runs to average
warmup
int
default:"10"
Number of warmup runs before measurement
return
float
Average latency in milliseconds

measure_latency_distribution

Measures latency statistics including mean, standard deviation, and 95th percentile.
def measure_latency_distribution(
    model: nn.Module,
    sample_input: torch.Tensor,
    repeats: int = 5,
    num_runs: int = 100,
    warmup: int = 10
) -> tuple[float, float, float]
model
nn.Module
required
Model to benchmark
sample_input
torch.Tensor
required
Sample input tensor
repeats
int
default:"5"
Number of measurement repetitions for statistical analysis
num_runs
int
default:"100"
Number of runs per repetition
warmup
int
default:"10"
Warmup runs before each measurement
return
tuple[float, float, float]
Tuple of (mean_latency_ms, std_latency_ms, p95_latency_ms)

model_memory_mb

Calculates model memory footprint from state dict.
def model_memory_mb(model: nn.Module) -> float
model
nn.Module
required
PyTorch model to analyze
return
float
Memory footprint in megabytes
Memory calculation includes all model parameters (weights and biases) based on their actual tensor sizes and data types.

Build docs developers (and LLMs) love