Skip to main content

Overview

The experiments module provides comprehensive functions for training models, running optimization sweeps across pruning levels and quantization precisions, computing Pareto frontiers, and generating visualization plots.

Functions

train_model

Trains a PyTorch model using the Adam optimizer and cross-entropy loss.
from edge_opt.experiments import train_model

trained_model = train_model(
    model=model,
    train_loader=train_loader,
    epochs=10,
    learning_rate=0.001,
    device=torch.device("cuda")
)

Parameters

model
nn.Module
required
The PyTorch model to train. Can be any torch.nn.Module instance.
train_loader
DataLoader
required
PyTorch DataLoader providing training batches. Must yield (inputs, targets) tuples.
epochs
int
required
Number of complete passes through the training dataset.
learning_rate
float
required
Learning rate for the Adam optimizer. Typical values: 0.001, 0.0001.
device
torch.device
required
Device to train on (e.g., torch.device("cuda") or torch.device("cpu")).

Returns

trained_model
nn.Module
The trained model with updated weights. The same model instance that was passed in (modified in-place).

Implementation Details

Training loop implementation:
  1. Moves model to specified device
  2. Sets model to training mode
  3. Creates Adam optimizer with specified learning rate
  4. Uses CrossEntropyLoss criterion
  5. For each epoch, iterates through all batches:
    • Moves data to device
    • Forward pass
    • Computes loss
    • Backpropagation
    • Optimizer step
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
criterion = nn.CrossEntropyLoss()

for _ in range(epochs):
    for inputs, targets in train_loader:
        inputs = inputs.to(device)
        targets = targets.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()
This function modifies the model in-place. If you need to preserve the original model, create a copy before training.

Example

import torch
from torch.utils.data import DataLoader
from edge_opt.model import SmallCNN
from edge_opt.experiments import train_model

# Setup
model = SmallCNN()
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Train
trained_model = train_model(
    model=model,
    train_loader=train_loader,
    epochs=20,
    learning_rate=0.001,
    device=device
)

run_sweep

Runs a comprehensive hyperparameter sweep across pruning levels and quantization precisions, collecting performance metrics for each configuration.
from edge_opt.experiments import run_sweep

results_df = run_sweep(
    base_model=model,
    val_loader=val_loader,
    calibration_loader=calib_loader,
    device=device,
    pruning_levels=[0.0, 0.2, 0.4, 0.6],
    precisions=["fp32", "fp16", "int8"],
    power_watts=2.0,
    calibration_batches=10,
    memory_budgets_mb=[1.0, 2.0, 5.0],
    active_memory_budget_mb=2.0,
    latency_multiplier=1.5,
    benchmark_repeats=5
)

Parameters

base_model
nn.Module
required
The trained base model to optimize. This model will be pruned and quantized in various configurations.
val_loader
DataLoader
required
Validation DataLoader for accuracy evaluation. Must yield (inputs, targets) tuples.
calibration_loader
DataLoader
required
DataLoader providing calibration data for INT8 quantization. Should contain representative samples.
device
torch.device
required
Device to run benchmarks on (CPU or CUDA).
pruning_levels
list[float]
required
List of pruning levels to sweep. Each value should be in [0.0, 1.0). Example: [0.0, 0.2, 0.4, 0.6, 0.8].
precisions
list[str]
required
List of precision formats to test. Valid values: "fp32", "fp16", "int8".
power_watts
float
required
Device power consumption in watts for energy proxy calculation. Typical values: 1.0-5.0 for edge devices.
calibration_batches
int
required
Number of calibration batches to use for INT8 quantization.
memory_budgets_mb
list[float]
required
List of memory budget thresholds in MB to check for violations. Example: [1.0, 2.0, 5.0].
active_memory_budget_mb
float
required
The active memory budget threshold in MB. Configurations exceeding this are marked as rejected (accepted=False).
latency_multiplier
float
required
Multiplier to scale measured latency (e.g., to simulate different hardware). Use 1.0 for no scaling.
benchmark_repeats
int
default:5
Number of times to repeat latency benchmarks for statistical robustness.

Returns

results_df
pd.DataFrame
A pandas DataFrame containing metrics for each configuration. Each row represents one configuration with the following columns:Configuration:
  • pruning_level: Pruning level applied (0.0 to <1.0)
  • precision: Precision format used (“fp32”, “fp16”, “int8”)
  • accepted: Boolean indicating if configuration meets active memory budget
  • active_budget_mb: The active memory budget threshold used
Performance Metrics (from PerfMetrics):
  • accuracy: Model accuracy on validation set (0.0 to 1.0)
  • latency_ms: Average inference latency in milliseconds
  • latency_std_ms: Standard deviation of latency
  • latency_p95_ms: 95th percentile latency
  • throughput_sps: Throughput in samples per second
  • memory_mb: Model memory footprint in megabytes
  • energy_proxy_j: Energy proxy in joules (latency_ms × power_watts / 1000)
Memory Budget Violations:
  • violates_{budget}mb: Boolean for each budget in memory_budgets_mb

Implementation Details

The sweep process:
  1. Iterate Configurations: For each combination of pruning level and precision
  2. Apply Pruning: Use structured_channel_prune with the pruning level
  3. Apply Quantization: Convert to specified precision (fp32/fp16/int8)
  4. Collect Metrics: Run comprehensive benchmarks using collect_metrics
  5. Check Budgets: Determine if configuration is accepted and check violations
  6. Aggregate Results: Compile all results into a pandas DataFrame
The sweep can generate a large number of configurations. For pruning_levels=[0.0, 0.2, 0.4, 0.6] and precisions=["fp32", "fp16", "int8"], you’ll get 4 × 3 = 12 configurations.
Sweep time increases linearly with the number of configurations and benchmark_repeats. Use fewer repeats for faster experimentation.

Example

import torch
import pandas as pd
from edge_opt.experiments import run_sweep

# Run comprehensive sweep
df = run_sweep(
    base_model=trained_model,
    val_loader=val_loader,
    calibration_loader=calib_loader,
    device=torch.device("cpu"),
    pruning_levels=[0.0, 0.2, 0.4, 0.6],
    precisions=["fp32", "fp16", "int8"],
    power_watts=2.0,
    calibration_batches=10,
    memory_budgets_mb=[1.0, 2.0, 5.0],
    active_memory_budget_mb=2.0,
    latency_multiplier=1.0,
    benchmark_repeats=5
)

# Analyze results
print(f"Total configurations: {len(df)}")
print(f"Accepted configurations: {df['accepted'].sum()}")
print(f"Best accuracy: {df['accuracy'].max():.4f}")
print(f"Lowest latency: {df['latency_ms'].min():.2f} ms")

# Filter by constraints
accepted = df[df['accepted']]
best_config = accepted.loc[accepted['accuracy'].idxmax()]
print(f"\nBest accepted config:")
print(f"  Pruning: {best_config['pruning_level']}")
print(f"  Precision: {best_config['precision']}")
print(f"  Accuracy: {best_config['accuracy']:.4f}")
print(f"  Memory: {best_config['memory_mb']:.2f} MB")

pareto_frontier

Computes the Pareto frontier of accepted configurations by selecting models that achieve the best accuracy for progressively increasing values of a constraint metric (latency or energy).
from edge_opt.experiments import pareto_frontier

latency_frontier = pareto_frontier(results_df, x_col="latency_ms")
energy_frontier = pareto_frontier(results_df, x_col="energy_proxy_j")

Parameters

df
pd.DataFrame
required
DataFrame of sweep results from run_sweep. Must contain columns: accepted, accuracy, and the column specified in x_col.
x_col
str
required
The constraint column name to optimize along (e.g., "latency_ms", "energy_proxy_j", "memory_mb"). Lower values of this metric are preferred.

Returns

frontier_df
pd.DataFrame
A DataFrame containing only the Pareto-optimal configurations. These are configurations where no other configuration achieves both better accuracy AND better constraint metric value.Properties:
  • Only includes accepted configurations (where accepted=True)
  • Sorted by increasing constraint metric (x_col)
  • Each row represents a non-dominated solution
  • Accuracy is strictly increasing along the frontier

Implementation Details

Pareto frontier algorithm:
  1. Filter Accepted: Only consider configurations meeting the active memory budget
  2. Sort: Sort by constraint metric (ascending) and accuracy (descending)
  3. Select Non-Dominated: Iterate through sorted configurations:
    • Keep configuration if it has better accuracy than all previous
    • Track best accuracy seen so far
    • Skip dominated configurations
ranked = df[df["accepted"]].sort_values(
    [x_col, "accuracy"], 
    ascending=[True, False]
).reset_index(drop=True)

frontier = []
best_accuracy = -1.0
for _, row in ranked.iterrows():
    if row["accuracy"] > best_accuracy:
        frontier.append(row)
        best_accuracy = row["accuracy"]
A configuration is Pareto-optimal if there’s no other configuration that is strictly better in all objectives. This function implements a simple greedy algorithm for the accuracy-vs-constraint trade-off.

Example

import pandas as pd
from edge_opt.experiments import run_sweep, pareto_frontier

# Run sweep
df = run_sweep(...)  # See run_sweep example

# Compute Pareto frontiers
latency_frontier = pareto_frontier(df, x_col="latency_ms")
energy_frontier = pareto_frontier(df, x_col="energy_proxy_j")
memory_frontier = pareto_frontier(df, x_col="memory_mb")

# Analyze latency frontier
print("Latency-Accuracy Pareto Frontier:")
for _, row in latency_frontier.iterrows():
    print(f"  {row['latency_ms']:.2f}ms @ {row['accuracy']:.4f} accuracy")
    print(f"    (pruning={row['pruning_level']}, precision={row['precision']})")

# Find configuration with best accuracy under 50ms latency
under_50ms = latency_frontier[latency_frontier['latency_ms'] < 50]
if not under_50ms.empty:
    best = under_50ms.loc[under_50ms['accuracy'].idxmax()]
    print(f"\nBest under 50ms: {best['accuracy']:.4f} @ {best['latency_ms']:.2f}ms")

save_plots

Generates and saves three visualization plots showing the trade-offs between accuracy and optimization metrics (latency, energy, memory).
from pathlib import Path
from edge_opt.experiments import save_plots

save_plots(
    df=results_df,
    latency_frontier=latency_frontier,
    energy_frontier=energy_frontier,
    output_dir=Path("./output/plots")
)

Parameters

df
pd.DataFrame
required
Complete DataFrame of sweep results from run_sweep. Must contain columns: accepted, accuracy, latency_ms, energy_proxy_j, memory_mb.
latency_frontier
pd.DataFrame
required
Pareto frontier DataFrame for latency (from pareto_frontier(df, "latency_ms")).
energy_frontier
pd.DataFrame
required
Pareto frontier DataFrame for energy (from pareto_frontier(df, "energy_proxy_j")).
output_dir
Path
required
Directory path where plots will be saved. Will be created if it doesn’t exist.

Returns

No return value. Creates three PNG files in the output directory:
accuracy_vs_latency.png
Scatter plot of accuracy vs latency with Pareto frontier overlay.
  • Blue points: Accepted configurations
  • Gray X markers: Rejected configurations (exceed memory budget)
  • Red line: Pareto frontier
  • Resolution: 180 DPI
accuracy_vs_energy.png
Scatter plot of accuracy vs energy proxy with Pareto frontier overlay.
  • Green points: Accepted configurations
  • Gray X markers: Rejected configurations
  • Red line: Pareto frontier
  • Resolution: 180 DPI
accuracy_vs_memory.png
Scatter plot of accuracy vs memory footprint.
  • Purple points: Accepted configurations
  • Gray X markers: Rejected configurations
  • No Pareto frontier (memory is a hard constraint)
  • Resolution: 180 DPI

Implementation Details

For each plot:
  1. Split Data: Separate accepted and rejected configurations
  2. Create Figure: 7×5 inch figure with matplotlib
  3. Plot Points:
    • Accepted: Colored circles with alpha=0.8
    • Rejected: Gray X markers with alpha=0.5
  4. Plot Frontier: Red line connecting Pareto-optimal points (latency and energy plots only)
  5. Formatting: Labels, title, legend, tight layout
  6. Save: 180 DPI PNG file
plt.figure(figsize=(7, 5))
plt.scatter(accepted["latency_ms"], accepted["accuracy"], 
            c="tab:blue", alpha=0.8, label="Accepted")
if not rejected.empty:
    plt.scatter(rejected["latency_ms"], rejected["accuracy"], 
                c="tab:gray", alpha=0.5, marker="x", label="Rejected")
plt.plot(latency_frontier["latency_ms"], latency_frontier["accuracy"], 
         color="red", linewidth=2, label="Pareto")
plt.xlabel("Latency (ms)")
plt.ylabel("Accuracy")
plt.title("Accuracy vs Latency")
plt.legend()
plt.tight_layout()
plt.savefig(output_dir / "accuracy_vs_latency.png", dpi=180)
plt.close()
The function automatically creates the output directory if it doesn’t exist using output_dir.mkdir(parents=True, exist_ok=True).

Example

from pathlib import Path
import torch
from edge_opt.experiments import run_sweep, pareto_frontier, save_plots

# Run sweep and compute frontiers
df = run_sweep(
    base_model=model,
    val_loader=val_loader,
    calibration_loader=calib_loader,
    device=torch.device("cpu"),
    pruning_levels=[0.0, 0.2, 0.4, 0.6],
    precisions=["fp32", "fp16", "int8"],
    power_watts=2.0,
    calibration_batches=10,
    memory_budgets_mb=[1.0, 2.0, 5.0],
    active_memory_budget_mb=2.0,
    latency_multiplier=1.0,
    benchmark_repeats=5
)

latency_frontier = pareto_frontier(df, "latency_ms")
energy_frontier = pareto_frontier(df, "energy_proxy_j")

# Generate plots
output_path = Path("./experiment_results")
save_plots(df, latency_frontier, energy_frontier, output_path)

print(f"Plots saved to {output_path.absolute()}")
print(f"  - accuracy_vs_latency.png")
print(f"  - accuracy_vs_energy.png")
print(f"  - accuracy_vs_memory.png")

Complete Workflow Example

import torch
from pathlib import Path
from torch.utils.data import DataLoader
from edge_opt.model import SmallCNN
from edge_opt.experiments import train_model, run_sweep, pareto_frontier, save_plots

# 1. Train base model
model = SmallCNN()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

trained = train_model(
    model=model,
    train_loader=train_loader,
    epochs=20,
    learning_rate=0.001,
    device=device
)

# 2. Run optimization sweep
results = run_sweep(
    base_model=trained,
    val_loader=val_loader,
    calibration_loader=calib_loader,
    device=device,
    pruning_levels=[0.0, 0.2, 0.4, 0.6],
    precisions=["fp32", "fp16", "int8"],
    power_watts=2.0,
    calibration_batches=10,
    memory_budgets_mb=[1.0, 2.0, 5.0],
    active_memory_budget_mb=2.0,
    latency_multiplier=1.0,
    benchmark_repeats=5
)

# 3. Compute Pareto frontiers
latency_pareto = pareto_frontier(results, "latency_ms")
energy_pareto = pareto_frontier(results, "energy_proxy_j")

# 4. Save results and visualizations
results.to_csv("sweep_results.csv", index=False)
save_plots(results, latency_pareto, energy_pareto, Path("./plots"))

# 5. Select deployment configuration
best_config = latency_pareto.loc[latency_pareto['accuracy'].idxmax()]
print(f"Selected configuration:")
print(f"  Accuracy: {best_config['accuracy']:.4f}")
print(f"  Latency: {best_config['latency_ms']:.2f} ms")
print(f"  Memory: {best_config['memory_mb']:.2f} MB")
print(f"  Energy: {best_config['energy_proxy_j']:.4f} J")

Build docs developers (and LLMs) love