Skip to main content
Memory budgets are critical constraints for edge deployment. The memory_violations function helps you track which model variants violate specific memory thresholds, while the sweep pipeline uses active_memory_budget_mb to filter out configurations that exceed your target device’s available memory.

Understanding Memory Budget Parameters

The memory budget system uses two related but distinct concepts:
memory_budgets_mb - A list of memory thresholds to track violations against (for reporting purposes)active_memory_budget_mb - The single hard constraint used to accept or reject model variants

memory_budgets_mb (Tracking)

This parameter allows you to track violations across multiple memory thresholds simultaneously. For example, you might want to know which models exceed 10MB, 20MB, and 50MB limits:
memory_budgets_mb = [10.0, 20.0, 50.0]
The system generates boolean flags for each threshold that appear in your results DataFrame.

active_memory_budget_mb (Filtering)

This is the enforcement mechanism - the single memory limit that determines whether a model variant is accepted or rejected:
active_memory_budget_mb = 20.0  # Hard limit for your target device
Any model variant exceeding this limit will be marked as accepted: False in the results.

Memory Violations Function

The memory_violations function is defined in src/edge_opt/metrics.py:66-67:
def memory_violations(memory_mb: float, budgets_mb: list[float]) -> dict[str, bool]:
    return {f"violates_{budget}mb": memory_mb > budget for budget in budgets_mb}

Function Signature

1

memory_mb: float

The measured memory footprint of the model in megabytes, typically obtained from model_memory_mb(model)
2

budgets_mb: list[float]

A list of memory budget thresholds to check against
3

Returns: dict[str, bool]

A dictionary with keys like violates_10mb, violates_20mb, etc., mapping to boolean violation status

How It Works

For each budget threshold in budgets_mb, the function creates a dictionary entry with:
  • Key: f"violates_{budget}mb" (e.g., "violates_20mb")
  • Value: True if memory_mb > budget, False otherwise
This produces a flat dictionary structure that integrates seamlessly into your results DataFrame.

Constraint Filtering in run_sweep

The run_sweep function in src/edge_opt/experiments.py:47-88 demonstrates how both memory parameters work together:
def run_sweep(
    base_model: nn.Module,
    val_loader: DataLoader,
    calibration_loader: DataLoader,
    device: torch.device,
    pruning_levels: list[float],
    precisions: list[str],
    power_watts: float,
    calibration_batches: int,
    memory_budgets_mb: list[float],      # For tracking violations
    active_memory_budget_mb: float,       # For filtering variants
    latency_multiplier: float,
    benchmark_repeats: int = 5,
) -> pd.DataFrame:

Filtering Logic

The key filtering logic appears in lines 76-82:
violations = memory_violations(metrics.memory_mb, memory_budgets_mb)
rejected = metrics.memory_mb > active_memory_budget_mb
row = {
    "pruning_level": pruning,
    "precision": precision,
    "accepted": not rejected,  # Boolean flag for filtering
    "active_budget_mb": active_memory_budget_mb,
    **asdict(metrics),
    **violations,  # Adds violates_Xmb columns
}
1

Compute violations flags

Call memory_violations() to generate tracking flags for all thresholds in memory_budgets_mb
2

Determine rejection status

Compare measured memory against active_memory_budget_mb to set the rejected boolean
3

Build result row

Merge all metrics, violations flags, and the accepted status into a single dictionary
4

Add to DataFrame

Append the row to the results, enabling downstream filtering and analysis

Practical Example

Here’s how the memory budget system works in practice:
from edge_opt.experiments import run_sweep
import torch

# Define memory budgets for tracking
memory_budgets_mb = [5.0, 10.0, 15.0, 20.0]

# Set the hard constraint for your device (e.g., Raspberry Pi with 512MB RAM)
active_memory_budget_mb = 12.0

# Run the optimization sweep
results_df = run_sweep(
    base_model=model,
    val_loader=val_loader,
    calibration_loader=calib_loader,
    device=torch.device("cpu"),
    pruning_levels=[0.0, 0.3, 0.5, 0.7],
    precisions=["fp32", "fp16", "int8"],
    power_watts=2.5,
    calibration_batches=10,
    memory_budgets_mb=memory_budgets_mb,
    active_memory_budget_mb=active_memory_budget_mb,
    latency_multiplier=1.0,
    benchmark_repeats=5,
)

# Filter to accepted variants only
accepted_models = results_df[results_df["accepted"]]
print(f"Accepted: {len(accepted_models)} / {len(results_df)} variants")

# Check specific violations
violates_10mb = results_df[results_df["violates_10mb"]]
print(f"{len(violates_10mb)} variants exceed 10MB")

Example Output

pruning_levelprecisionmemory_mbacceptedviolates_10mbviolates_20mb
0.0fp3225.3FalseTrueTrue
0.3fp3217.7FalseTrueFalse
0.5fp168.8TrueFalseFalse
0.7int86.2TrueFalseFalse
In this example, only variants with accepted=True will be considered for Pareto frontier analysis. The violates_Xmb columns provide additional insight into which specific thresholds are exceeded.

Integration with Pareto Frontier Analysis

The pareto_frontier function (defined in src/edge_opt/experiments.py:91-99) automatically filters to accepted variants:
def pareto_frontier(df: pd.DataFrame, x_col: str) -> pd.DataFrame:
    ranked = df[df["accepted"]].sort_values([x_col, "accuracy"], ascending=[True, False]).reset_index(drop=True)
    frontier = []
    best_accuracy = -1.0
    for _, row in ranked.iterrows():
        if row["accuracy"] > best_accuracy:
            frontier.append(row)
            best_accuracy = row["accuracy"]
    return pd.DataFrame(frontier)
Notice line 92: df[df["accepted"]] - this ensures only variants within the active memory budget are considered.
If active_memory_budget_mb is too restrictive, you may end up with zero accepted variants, causing the Pareto frontier computation to return an empty DataFrame.

Best Practices

Begin with a tight active_memory_budget_mb based on your device specs, then relax if needed. It’s easier to increase the budget than to discover memory issues in production.
Set memory_budgets_mb to include your target device and nearby alternatives. This helps when you need to port to different hardware later.
The model_memory_mb function (defined in src/edge_opt/metrics.py:58-63) measures only model parameters. Add 20-30% headroom for activations, framework overhead, and application memory.
Memory budgets work best when combined with aggressive optimization:
  • Pruning: Reduces parameter count linearly
  • Quantization: Reduces memory by 2-4x (fp16, int8)
  • Combined: Can achieve 8-10x memory reduction
  • model_memory_mb() - Computes memory footprint (src/edge_opt/metrics.py:58)
  • collect_metrics() - Gathers memory alongside other metrics (src/edge_opt/metrics.py:70)
  • pareto_frontier() - Filters accepted variants for optimal tradeoffs (src/edge_opt/experiments.py:91)

Build docs developers (and LLMs) love