Experiments

Overview

The experiments module provides comprehensive functions for training models, running optimization sweeps across pruning levels and quantization precisions, computing Pareto frontiers, and generating visualization plots.

Functions

train_model

Trains a PyTorch model using the Adam optimizer and cross-entropy loss.

from edge_opt.experiments import train_model

trained_model = train_model(
    model=model,
    train_loader=train_loader,
    epochs=10,
    learning_rate=0.001,
    device=torch.device("cuda")
)

Parameters

model

nn.Module

required

The PyTorch model to train. Can be any torch.nn.Module instance.

train_loader

DataLoader

required

PyTorch DataLoader providing training batches. Must yield (inputs, targets) tuples.

epochs

int

required

Number of complete passes through the training dataset.

learning_rate

float

required

Learning rate for the Adam optimizer. Typical values: 0.001, 0.0001.

device

torch.device

required

Device to train on (e.g., torch.device("cuda") or torch.device("cpu")).

Returns

trained_model

nn.Module

The trained model with updated weights. The same model instance that was passed in (modified in-place).

Implementation Details

Training loop implementation:

Moves model to specified device
Sets model to training mode
Creates Adam optimizer with specified learning rate
Uses CrossEntropyLoss criterion
For each epoch, iterates through all batches:
- Moves data to device
- Forward pass
- Computes loss
- Backpropagation
- Optimizer step

optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
criterion = nn.CrossEntropyLoss()

for _ in range(epochs):
    for inputs, targets in train_loader:
        inputs = inputs.to(device)
        targets = targets.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

This function modifies the model in-place. If you need to preserve the original model, create a copy before training.

Example

import torch
from torch.utils.data import DataLoader
from edge_opt.model import SmallCNN
from edge_opt.experiments import train_model

# Setup
model = SmallCNN()
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Train
trained_model = train_model(
    model=model,
    train_loader=train_loader,
    epochs=20,
    learning_rate=0.001,
    device=device
)

run_sweep

Runs a comprehensive hyperparameter sweep across pruning levels and quantization precisions, collecting performance metrics for each configuration.

from edge_opt.experiments import run_sweep

results_df = run_sweep(
    base_model=model,
    val_loader=val_loader,
    calibration_loader=calib_loader,
    device=device,
    pruning_levels=[0.0, 0.2, 0.4, 0.6],
    precisions=["fp32", "fp16", "int8"],
    power_watts=2.0,
    calibration_batches=10,
    memory_budgets_mb=[1.0, 2.0, 5.0],
    active_memory_budget_mb=2.0,
    latency_multiplier=1.5,
    benchmark_repeats=5
)

Parameters

base_model

nn.Module

required

The trained base model to optimize. This model will be pruned and quantized in various configurations.

val_loader

DataLoader

required

Validation DataLoader for accuracy evaluation. Must yield (inputs, targets) tuples.

calibration_loader

DataLoader

required

DataLoader providing calibration data for INT8 quantization. Should contain representative samples.

device

torch.device

required

Device to run benchmarks on (CPU or CUDA).

pruning_levels

list[float]

required

List of pruning levels to sweep. Each value should be in [0.0, 1.0). Example: [0.0, 0.2, 0.4, 0.6, 0.8].

precisions

list[str]

required

List of precision formats to test. Valid values: "fp32", "fp16", "int8".

power_watts

float

required

Device power consumption in watts for energy proxy calculation. Typical values: 1.0-5.0 for edge devices.

calibration_batches

int

required

Number of calibration batches to use for INT8 quantization.

memory_budgets_mb

list[float]

required

List of memory budget thresholds in MB to check for violations. Example: [1.0, 2.0, 5.0].

active_memory_budget_mb

float

required

The active memory budget threshold in MB. Configurations exceeding this are marked as rejected (accepted=False).

latency_multiplier

float

required

Multiplier to scale measured latency (e.g., to simulate different hardware). Use 1.0 for no scaling.

benchmark_repeats

int

default:5

Number of times to repeat latency benchmarks for statistical robustness.

Returns

results_df

pd.DataFrame

A pandas DataFrame containing metrics for each configuration. Each row represents one configuration with the following columns:Configuration:

pruning_level: Pruning level applied (0.0 to <1.0)
precision: Precision format used (“fp32”, “fp16”, “int8”)
accepted: Boolean indicating if configuration meets active memory budget
active_budget_mb: The active memory budget threshold used

Performance Metrics (from PerfMetrics):

accuracy: Model accuracy on validation set (0.0 to 1.0)
latency_ms: Average inference latency in milliseconds
latency_std_ms: Standard deviation of latency
latency_p95_ms: 95th percentile latency
throughput_sps: Throughput in samples per second
memory_mb: Model memory footprint in megabytes
energy_proxy_j: Energy proxy in joules (latency_ms × power_watts / 1000)

Memory Budget Violations:

violates_{budget}mb: Boolean for each budget in memory_budgets_mb

Implementation Details

The sweep process:

Iterate Configurations: For each combination of pruning level and precision
Apply Pruning: Use structured_channel_prune with the pruning level
Apply Quantization: Convert to specified precision (fp32/fp16/int8)
Collect Metrics: Run comprehensive benchmarks using collect_metrics
Check Budgets: Determine if configuration is accepted and check violations
Aggregate Results: Compile all results into a pandas DataFrame

The sweep can generate a large number of configurations. For pruning_levels=[0.0, 0.2, 0.4, 0.6] and precisions=["fp32", "fp16", "int8"], you’ll get 4 × 3 = 12 configurations.

Sweep time increases linearly with the number of configurations and benchmark_repeats. Use fewer repeats for faster experimentation.

Example

import torch
import pandas as pd
from edge_opt.experiments import run_sweep

# Run comprehensive sweep
df = run_sweep(
    base_model=trained_model,
    val_loader=val_loader,
    calibration_loader=calib_loader,
    device=torch.device("cpu"),
    pruning_levels=[0.0, 0.2, 0.4, 0.6],
    precisions=["fp32", "fp16", "int8"],
    power_watts=2.0,
    calibration_batches=10,
    memory_budgets_mb=[1.0, 2.0, 5.0],
    active_memory_budget_mb=2.0,
    latency_multiplier=1.0,
    benchmark_repeats=5
)

# Analyze results
print(f"Total configurations: {len(df)}")
print(f"Accepted configurations: {df['accepted'].sum()}")
print(f"Best accuracy: {df['accuracy'].max():.4f}")
print(f"Lowest latency: {df['latency_ms'].min():.2f} ms")

# Filter by constraints
accepted = df[df['accepted']]
best_config = accepted.loc[accepted['accuracy'].idxmax()]
print(f"\nBest accepted config:")
print(f"  Pruning: {best_config['pruning_level']}")
print(f"  Precision: {best_config['precision']}")
print(f"  Accuracy: {best_config['accuracy']:.4f}")
print(f"  Memory: {best_config['memory_mb']:.2f} MB")

pareto_frontier

Computes the Pareto frontier of accepted configurations by selecting models that achieve the best accuracy for progressively increasing values of a constraint metric (latency or energy).

from edge_opt.experiments import pareto_frontier

latency_frontier = pareto_frontier(results_df, x_col="latency_ms")
energy_frontier = pareto_frontier(results_df, x_col="energy_proxy_j")

Parameters

pd.DataFrame

required

DataFrame of sweep results from run_sweep. Must contain columns: accepted, accuracy, and the column specified in x_col.

x_col

str

required

The constraint column name to optimize along (e.g., "latency_ms", "energy_proxy_j", "memory_mb"). Lower values of this metric are preferred.

Returns

frontier_df

pd.DataFrame

A DataFrame containing only the Pareto-optimal configurations. These are configurations where no other configuration achieves both better accuracy AND better constraint metric value.Properties:

Only includes accepted configurations (where accepted=True)
Sorted by increasing constraint metric (x_col)
Each row represents a non-dominated solution
Accuracy is strictly increasing along the frontier

Implementation Details

Pareto frontier algorithm:

Filter Accepted: Only consider configurations meeting the active memory budget
Sort: Sort by constraint metric (ascending) and accuracy (descending)
Select Non-Dominated: Iterate through sorted configurations:
- Keep configuration if it has better accuracy than all previous
- Track best accuracy seen so far
- Skip dominated configurations

ranked = df[df["accepted"]].sort_values(
    [x_col, "accuracy"], 
    ascending=[True, False]
).reset_index(drop=True)

frontier = []
best_accuracy = -1.0
for _, row in ranked.iterrows():
    if row["accuracy"] > best_accuracy:
        frontier.append(row)
        best_accuracy = row["accuracy"]

A configuration is Pareto-optimal if there’s no other configuration that is strictly better in all objectives. This function implements a simple greedy algorithm for the accuracy-vs-constraint trade-off.

Example

import pandas as pd
from edge_opt.experiments import run_sweep, pareto_frontier

# Run sweep
df = run_sweep(...)  # See run_sweep example

# Compute Pareto frontiers
latency_frontier = pareto_frontier(df, x_col="latency_ms")
energy_frontier = pareto_frontier(df, x_col="energy_proxy_j")
memory_frontier = pareto_frontier(df, x_col="memory_mb")

# Analyze latency frontier
print("Latency-Accuracy Pareto Frontier:")
for _, row in latency_frontier.iterrows():
    print(f"  {row['latency_ms']:.2f}ms @ {row['accuracy']:.4f} accuracy")
    print(f"    (pruning={row['pruning_level']}, precision={row['precision']})")

# Find configuration with best accuracy under 50ms latency
under_50ms = latency_frontier[latency_frontier['latency_ms'] < 50]
if not under_50ms.empty:
    best = under_50ms.loc[under_50ms['accuracy'].idxmax()]
    print(f"\nBest under 50ms: {best['accuracy']:.4f} @ {best['latency_ms']:.2f}ms")

save_plots

Generates and saves three visualization plots showing the trade-offs between accuracy and optimization metrics (latency, energy, memory).

from pathlib import Path
from edge_opt.experiments import save_plots

save_plots(
    df=results_df,
    latency_frontier=latency_frontier,
    energy_frontier=energy_frontier,
    output_dir=Path("./output/plots")
)

Parameters

pd.DataFrame

required

Complete DataFrame of sweep results from run_sweep. Must contain columns: accepted, accuracy, latency_ms, energy_proxy_j, memory_mb.

latency_frontier

pd.DataFrame

required

Pareto frontier DataFrame for latency (from pareto_frontier(df, "latency_ms")).

energy_frontier

pd.DataFrame

required

Pareto frontier DataFrame for energy (from pareto_frontier(df, "energy_proxy_j")).

output_dir

Path

required

Directory path where plots will be saved. Will be created if it doesn’t exist.

Returns

No return value. Creates three PNG files in the output directory:

accuracy_vs_latency.png

Scatter plot of accuracy vs latency with Pareto frontier overlay.

Blue points: Accepted configurations
Gray X markers: Rejected configurations (exceed memory budget)
Red line: Pareto frontier
Resolution: 180 DPI

accuracy_vs_energy.png

Scatter plot of accuracy vs energy proxy with Pareto frontier overlay.

Green points: Accepted configurations
Gray X markers: Rejected configurations
Red line: Pareto frontier
Resolution: 180 DPI

accuracy_vs_memory.png

Scatter plot of accuracy vs memory footprint.

Purple points: Accepted configurations
Gray X markers: Rejected configurations
No Pareto frontier (memory is a hard constraint)
Resolution: 180 DPI

Implementation Details

For each plot:

Split Data: Separate accepted and rejected configurations
Create Figure: 7×5 inch figure with matplotlib
Plot Points:
- Accepted: Colored circles with alpha=0.8
- Rejected: Gray X markers with alpha=0.5
Plot Frontier: Red line connecting Pareto-optimal points (latency and energy plots only)
Formatting: Labels, title, legend, tight layout
Save: 180 DPI PNG file

plt.figure(figsize=(7, 5))
plt.scatter(accepted["latency_ms"], accepted["accuracy"], 
            c="tab:blue", alpha=0.8, label="Accepted")
if not rejected.empty:
    plt.scatter(rejected["latency_ms"], rejected["accuracy"], 
                c="tab:gray", alpha=0.5, marker="x", label="Rejected")
plt.plot(latency_frontier["latency_ms"], latency_frontier["accuracy"], 
         color="red", linewidth=2, label="Pareto")
plt.xlabel("Latency (ms)")
plt.ylabel("Accuracy")
plt.title("Accuracy vs Latency")
plt.legend()
plt.tight_layout()
plt.savefig(output_dir / "accuracy_vs_latency.png", dpi=180)
plt.close()

The function automatically creates the output directory if it doesn’t exist using output_dir.mkdir(parents=True, exist_ok=True).

Example

from pathlib import Path
import torch
from edge_opt.experiments import run_sweep, pareto_frontier, save_plots

# Run sweep and compute frontiers
df = run_sweep(
    base_model=model,
    val_loader=val_loader,
    calibration_loader=calib_loader,
    device=torch.device("cpu"),
    pruning_levels=[0.0, 0.2, 0.4, 0.6],
    precisions=["fp32", "fp16", "int8"],
    power_watts=2.0,
    calibration_batches=10,
    memory_budgets_mb=[1.0, 2.0, 5.0],
    active_memory_budget_mb=2.0,
    latency_multiplier=1.0,
    benchmark_repeats=5
)

latency_frontier = pareto_frontier(df, "latency_ms")
energy_frontier = pareto_frontier(df, "energy_proxy_j")

# Generate plots
output_path = Path("./experiment_results")
save_plots(df, latency_frontier, energy_frontier, output_path)

print(f"Plots saved to {output_path.absolute()}")
print(f"  - accuracy_vs_latency.png")
print(f"  - accuracy_vs_energy.png")
print(f"  - accuracy_vs_memory.png")

Complete Workflow Example

import torch
from pathlib import Path
from torch.utils.data import DataLoader
from edge_opt.model import SmallCNN
from edge_opt.experiments import train_model, run_sweep, pareto_frontier, save_plots

# 1. Train base model
model = SmallCNN()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

trained = train_model(
    model=model,
    train_loader=train_loader,
    epochs=20,
    learning_rate=0.001,
    device=device
)

# 2. Run optimization sweep
results = run_sweep(
    base_model=trained,
    val_loader=val_loader,
    calibration_loader=calib_loader,
    device=device,
    pruning_levels=[0.0, 0.2, 0.4, 0.6],
    precisions=["fp32", "fp16", "int8"],
    power_watts=2.0,
    calibration_batches=10,
    memory_budgets_mb=[1.0, 2.0, 5.0],
    active_memory_budget_mb=2.0,
    latency_multiplier=1.0,
    benchmark_repeats=5
)

# 3. Compute Pareto frontiers
latency_pareto = pareto_frontier(results, "latency_ms")
energy_pareto = pareto_frontier(results, "energy_proxy_j")

# 4. Save results and visualizations
results.to_csv("sweep_results.csv", index=False)
save_plots(results, latency_pareto, energy_pareto, Path("./plots"))

# 5. Select deployment configuration
best_config = latency_pareto.loc[latency_pareto['accuracy'].idxmax()]
print(f"Selected configuration:")
print(f"  Accuracy: {best_config['accuracy']:.4f}")
print(f"  Latency: {best_config['latency_ms']:.2f} ms")
print(f"  Memory: {best_config['memory_mb']:.2f} MB")
print(f"  Energy: {best_config['energy_proxy_j']:.4f} J")

Core Modules

Optimization

Analysis

Overview

Functions

train_model

Parameters

Returns

Implementation Details

Example

run_sweep

Parameters

Returns

Implementation Details

Example

pareto_frontier

Parameters

Returns

Implementation Details

Example

save_plots

Parameters

Returns

Implementation Details

Example

Complete Workflow Example

Build docs developers (and LLMs) love

Core Modules

Optimization

Analysis

​Overview

​Functions

​train_model

​Parameters

​Returns

​Implementation Details

​Example

​run_sweep

​Parameters

​Returns

​Implementation Details

​Example

​pareto_frontier

​Parameters

​Returns

​Implementation Details

​Example

​save_plots

​Parameters

​Returns

​Implementation Details

​Example

​Complete Workflow Example

Build docs developers (and LLMs) love

Overview

Functions

train_model

Parameters

Returns

Implementation Details

Example

run_sweep

Parameters

Returns

Implementation Details

Example

pareto_frontier

Parameters

Returns

Implementation Details

Example

save_plots

Parameters

Returns

Implementation Details

Example

Complete Workflow Example