Memory Budget Constraints

Memory budgets are critical constraints for edge deployment. The memory_violations function helps you track which model variants violate specific memory thresholds, while the sweep pipeline uses active_memory_budget_mb to filter out configurations that exceed your target device’s available memory.

Understanding Memory Budget Parameters

The memory budget system uses two related but distinct concepts:

memory_budgets_mb - A list of memory thresholds to track violations against (for reporting purposes)active_memory_budget_mb - The single hard constraint used to accept or reject model variants

memory_budgets_mb (Tracking)

This parameter allows you to track violations across multiple memory thresholds simultaneously. For example, you might want to know which models exceed 10MB, 20MB, and 50MB limits:

memory_budgets_mb = [10.0, 20.0, 50.0]

The system generates boolean flags for each threshold that appear in your results DataFrame.

active_memory_budget_mb (Filtering)

This is the enforcement mechanism - the single memory limit that determines whether a model variant is accepted or rejected:

active_memory_budget_mb = 20.0  # Hard limit for your target device

Any model variant exceeding this limit will be marked as accepted: False in the results.

Memory Violations Function

The memory_violations function is defined in src/edge_opt/metrics.py:66-67:

def memory_violations(memory_mb: float, budgets_mb: list[float]) -> dict[str, bool]:
    return {f"violates_{budget}mb": memory_mb > budget for budget in budgets_mb}

Function Signature

memory_mb: float

The measured memory footprint of the model in megabytes, typically obtained from model_memory_mb(model)

budgets_mb: list[float]

A list of memory budget thresholds to check against

Returns: dict[str, bool]

A dictionary with keys like violates_10mb, violates_20mb, etc., mapping to boolean violation status

How It Works

For each budget threshold in budgets_mb, the function creates a dictionary entry with:

Key: f"violates_{budget}mb" (e.g., "violates_20mb")
Value: True if memory_mb > budget, False otherwise

This produces a flat dictionary structure that integrates seamlessly into your results DataFrame.

Constraint Filtering in run_sweep

The run_sweep function in src/edge_opt/experiments.py:47-88 demonstrates how both memory parameters work together:

def run_sweep(
    base_model: nn.Module,
    val_loader: DataLoader,
    calibration_loader: DataLoader,
    device: torch.device,
    pruning_levels: list[float],
    precisions: list[str],
    power_watts: float,
    calibration_batches: int,
    memory_budgets_mb: list[float],      # For tracking violations
    active_memory_budget_mb: float,       # For filtering variants
    latency_multiplier: float,
    benchmark_repeats: int = 5,
) -> pd.DataFrame:

Filtering Logic

The key filtering logic appears in lines 76-82:

violations = memory_violations(metrics.memory_mb, memory_budgets_mb)
rejected = metrics.memory_mb > active_memory_budget_mb
row = {
    "pruning_level": pruning,
    "precision": precision,
    "accepted": not rejected,  # Boolean flag for filtering
    "active_budget_mb": active_memory_budget_mb,
    **asdict(metrics),
    **violations,  # Adds violates_Xmb columns
}

Compute violations flags

Call memory_violations() to generate tracking flags for all thresholds in memory_budgets_mb

Determine rejection status

Compare measured memory against active_memory_budget_mb to set the rejected boolean

Build result row

Merge all metrics, violations flags, and the accepted status into a single dictionary

Add to DataFrame

Append the row to the results, enabling downstream filtering and analysis

Practical Example

Here’s how the memory budget system works in practice:

from edge_opt.experiments import run_sweep
import torch

# Define memory budgets for tracking
memory_budgets_mb = [5.0, 10.0, 15.0, 20.0]

# Set the hard constraint for your device (e.g., Raspberry Pi with 512MB RAM)
active_memory_budget_mb = 12.0

# Run the optimization sweep
results_df = run_sweep(
    base_model=model,
    val_loader=val_loader,
    calibration_loader=calib_loader,
    device=torch.device("cpu"),
    pruning_levels=[0.0, 0.3, 0.5, 0.7],
    precisions=["fp32", "fp16", "int8"],
    power_watts=2.5,
    calibration_batches=10,
    memory_budgets_mb=memory_budgets_mb,
    active_memory_budget_mb=active_memory_budget_mb,
    latency_multiplier=1.0,
    benchmark_repeats=5,
)

# Filter to accepted variants only
accepted_models = results_df[results_df["accepted"]]
print(f"Accepted: {len(accepted_models)} / {len(results_df)} variants")

# Check specific violations
violates_10mb = results_df[results_df["violates_10mb"]]
print(f"{len(violates_10mb)} variants exceed 10MB")

Example Output

pruning_level	precision	memory_mb	accepted	violates_10mb	violates_20mb
0.0	fp32	25.3	False	True	True
0.3	fp32	17.7	False	True	False
0.5	fp16	8.8	True	False	False
0.7	int8	6.2	True	False	False

In this example, only variants with accepted=True will be considered for Pareto frontier analysis. The violates_Xmb columns provide additional insight into which specific thresholds are exceeded.

Integration with Pareto Frontier Analysis

The pareto_frontier function (defined in src/edge_opt/experiments.py:91-99) automatically filters to accepted variants:

def pareto_frontier(df: pd.DataFrame, x_col: str) -> pd.DataFrame:
    ranked = df[df["accepted"]].sort_values([x_col, "accuracy"], ascending=[True, False]).reset_index(drop=True)
    frontier = []
    best_accuracy = -1.0
    for _, row in ranked.iterrows():
        if row["accuracy"] > best_accuracy:
            frontier.append(row)
            best_accuracy = row["accuracy"]
    return pd.DataFrame(frontier)

Notice line 92: df[df["accepted"]] - this ensures only variants within the active memory budget are considered.

If active_memory_budget_mb is too restrictive, you may end up with zero accepted variants, causing the Pareto frontier computation to return an empty DataFrame.

Best Practices

Start with Conservative Budgets

Begin with a tight active_memory_budget_mb based on your device specs, then relax if needed. It’s easier to increase the budget than to discover memory issues in production.

Use Multiple Tracking Thresholds

Set memory_budgets_mb to include your target device and nearby alternatives. This helps when you need to port to different hardware later.

Account for Runtime Overhead

The model_memory_mb function (defined in src/edge_opt/metrics.py:58-63) measures only model parameters. Add 20-30% headroom for activations, framework overhead, and application memory.

Combine with Pruning and Quantization

Memory budgets work best when combined with aggressive optimization:

Pruning: Reduces parameter count linearly
Quantization: Reduces memory by 2-4x (fp16, int8)
Combined: Can achieve 8-10x memory reduction

model_memory_mb() - Computes memory footprint (src/edge_opt/metrics.py:58)
collect_metrics() - Gathers memory alongside other metrics (src/edge_opt/metrics.py:70)
pareto_frontier() - Filters accepted variants for optimal tradeoffs (src/edge_opt/experiments.py:91)

Get Started

Core Concepts

Guides

Hardware Analysis

Deployment

Understanding Memory Budget Parameters

memory_budgets_mb (Tracking)

active_memory_budget_mb (Filtering)

Memory Violations Function

Function Signature

How It Works

Constraint Filtering in run_sweep

Filtering Logic

Practical Example

Example Output

Integration with Pareto Frontier Analysis

Best Practices

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Hardware Analysis

Deployment

​Understanding Memory Budget Parameters

​memory_budgets_mb (Tracking)

​active_memory_budget_mb (Filtering)

​Memory Violations Function

​Function Signature

​How It Works

​Constraint Filtering in run_sweep

​Filtering Logic

​Practical Example

​Example Output

​Integration with Pareto Frontier Analysis

​Best Practices

​Related Functions

Build docs developers (and LLMs) love

Understanding Memory Budget Parameters

memory_budgets_mb (Tracking)

active_memory_budget_mb (Filtering)

Memory Violations Function

Function Signature

How It Works

Constraint Filtering in run_sweep

Filtering Logic

Practical Example

Example Output

Integration with Pareto Frontier Analysis

Best Practices

Related Functions