memory_violations function helps you track which model variants violate specific memory thresholds, while the sweep pipeline uses active_memory_budget_mb to filter out configurations that exceed your target device’s available memory.
Understanding Memory Budget Parameters
The memory budget system uses two related but distinct concepts:memory_budgets_mb - A list of memory thresholds to track violations against (for reporting purposes)active_memory_budget_mb - The single hard constraint used to accept or reject model variants
memory_budgets_mb (Tracking)
This parameter allows you to track violations across multiple memory thresholds simultaneously. For example, you might want to know which models exceed 10MB, 20MB, and 50MB limits:active_memory_budget_mb (Filtering)
This is the enforcement mechanism - the single memory limit that determines whether a model variant is accepted or rejected:accepted: False in the results.
Memory Violations Function
Thememory_violations function is defined in src/edge_opt/metrics.py:66-67:
Function Signature
memory_mb: float
The measured memory footprint of the model in megabytes, typically obtained from
model_memory_mb(model)How It Works
For each budget threshold inbudgets_mb, the function creates a dictionary entry with:
- Key:
f"violates_{budget}mb"(e.g.,"violates_20mb") - Value:
Trueifmemory_mb > budget,Falseotherwise
Constraint Filtering in run_sweep
Therun_sweep function in src/edge_opt/experiments.py:47-88 demonstrates how both memory parameters work together:
Filtering Logic
The key filtering logic appears in lines 76-82:Compute violations flags
Call
memory_violations() to generate tracking flags for all thresholds in memory_budgets_mbDetermine rejection status
Compare measured memory against
active_memory_budget_mb to set the rejected booleanBuild result row
Merge all metrics, violations flags, and the
accepted status into a single dictionaryPractical Example
Here’s how the memory budget system works in practice:Example Output
| pruning_level | precision | memory_mb | accepted | violates_10mb | violates_20mb |
|---|---|---|---|---|---|
| 0.0 | fp32 | 25.3 | False | True | True |
| 0.3 | fp32 | 17.7 | False | True | False |
| 0.5 | fp16 | 8.8 | True | False | False |
| 0.7 | int8 | 6.2 | True | False | False |
In this example, only variants with
accepted=True will be considered for Pareto frontier analysis. The violates_Xmb columns provide additional insight into which specific thresholds are exceeded.Integration with Pareto Frontier Analysis
Thepareto_frontier function (defined in src/edge_opt/experiments.py:91-99) automatically filters to accepted variants:
df[df["accepted"]] - this ensures only variants within the active memory budget are considered.
Best Practices
Start with Conservative Budgets
Start with Conservative Budgets
Begin with a tight
active_memory_budget_mb based on your device specs, then relax if needed. It’s easier to increase the budget than to discover memory issues in production.Use Multiple Tracking Thresholds
Use Multiple Tracking Thresholds
Set
memory_budgets_mb to include your target device and nearby alternatives. This helps when you need to port to different hardware later.Account for Runtime Overhead
Account for Runtime Overhead
The
model_memory_mb function (defined in src/edge_opt/metrics.py:58-63) measures only model parameters. Add 20-30% headroom for activations, framework overhead, and application memory.Combine with Pruning and Quantization
Combine with Pruning and Quantization
Memory budgets work best when combined with aggressive optimization:
- Pruning: Reduces parameter count linearly
- Quantization: Reduces memory by 2-4x (fp16, int8)
- Combined: Can achieve 8-10x memory reduction
Related Functions
model_memory_mb()- Computes memory footprint (src/edge_opt/metrics.py:58)collect_metrics()- Gathers memory alongside other metrics (src/edge_opt/metrics.py:70)pareto_frontier()- Filters accepted variants for optimal tradeoffs (src/edge_opt/experiments.py:91)