deployment_simulation function models this behavior by applying a frequency-dependent latency multiplier to benchmark results.
Overview
Thedeployment_simulation function is defined in src/edge_opt/deploy.py:10-33 and provides a realistic simulation of how your model performs under different CPU frequency scaling scenarios.
This function benchmarks both batch inference (processing multiple samples at once) and streaming inference (processing samples one at a time), then applies a latency multiplier based on CPU frequency scaling.
CPU Frequency Scale Parameter
Thecpu_frequency_scale parameter represents the ratio of the current CPU frequency to the maximum frequency:
Example Values
| Scenario | Max Freq | Current Freq | cpu_frequency_scale |
|---|---|---|---|
| Full performance | 2.0 GHz | 2.0 GHz | 1.0 |
| Power saving mode | 2.0 GHz | 1.0 GHz | 0.5 |
| Ultra-low power | 2.0 GHz | 0.4 GHz | 0.2 |
| Overclocked | 2.0 GHz | 2.4 GHz | 1.2 |
Latency Multiplier Calculation
The key insight is that latency is inversely proportional to CPU frequency. This relationship is captured in line 13:Mathematical Relationship
Compute Multiplier
The multiplier converts benchmark latency (measured at full speed) to scaled latency
Safety Check
The
max(cpu_frequency_scale, 1e-6) prevents division by zero if frequency scale is accidentally set to 0Example Calculations
| cpu_frequency_scale | latency_multiplier | Effect on Latency |
|---|---|---|
| 1.0 (full speed) | 1.0 | No change (100%) |
| 0.5 (half speed) | 2.0 | 2x slower (200%) |
| 0.25 (quarter speed) | 4.0 | 4x slower (400%) |
| 0.1 (extreme throttle) | 10.0 | 10x slower (1000%) |
Function Implementation
Here’s the complete implementation with detailed breakdown:Key Operations
Batch vs Streaming Inference
Batch vs Streaming Inference
Batch inference (lines 16-18): Processes all samples in the batch simultaneously, leveraging vectorization and parallel computation.Streaming inference (lines 20-24): Processes samples one at a time in a loop, simulating real-time edge scenarios where data arrives sequentially.Batch inference is typically 5-20x faster per sample due to hardware parallelism.
Time Measurement
Time Measurement
Uses
time.perf_counter() for high-resolution timing. The measured wall-clock time is then multiplied by latency_multiplier to simulate the slower CPU.Throughput Calculation
Throughput Calculation
Throughput is measured in samples per second (sps):As latency increases (due to frequency scaling), throughput decreases proportionally.
Return Value Structure
The function returns a dictionary with six key metrics:All latency values are in milliseconds, and all throughput values are in samples per second.
Practical Usage Example
Example Output
Integration with collect_metrics
Thelatency_multiplier concept is also used in the main metrics collection function (src/edge_opt/metrics.py:70-99):
deployment_simulation and pass it to collect_metrics for consistent frequency scaling across your entire evaluation pipeline:
Use Cases
Raspberry Pi Power Modes
Raspberry Pi Power Modes
Raspberry Pi 4 supports multiple CPU governors (performance, powersave, ondemand). Use
cpu_frequency_scale to model each mode:- performance: scale = 1.0 (1.5 GHz)
- ondemand: scale = 0.8 (1.2 GHz)
- powersave: scale = 0.4 (600 MHz)
Battery-Constrained Devices
Battery-Constrained Devices
On battery-powered devices, CPU frequency dynamically adjusts based on remaining charge. Simulate different battery levels:
- 100-80% battery: scale = 1.0
- 80-40% battery: scale = 0.7
- 40-20% battery: scale = 0.5
- <20% battery: scale = 0.3
Thermal Throttling
Thermal Throttling
When devices overheat, CPUs automatically reduce frequency. Model thermal scenarios:
- Normal temperature: scale = 1.0
- Warm (60°C): scale = 0.85
- Hot (70°C): scale = 0.6
- Critical (80°C): scale = 0.4
Multi-Device Deployment
Multi-Device Deployment
Different edge devices have different base frequencies. Normalize comparisons:
Limitations and Considerations
The
stream_items parameter (default=128) controls how many samples are processed in streaming mode. Increase this for more stable timing measurements, but be aware it will slow down the simulation.Related Functions
measure_latency()- Core latency measurement (src/edge_opt/metrics.py:39)measure_latency_distribution()- Latency with statistics (src/edge_opt/metrics.py:53)collect_metrics()- Full metrics with latency multiplier (src/edge_opt/metrics.py:70)