Skip to main content
The Edge AI Hardware Optimization framework uses a YAML configuration file to control all aspects of model training, optimization, and benchmarking. The default configuration is located at configs/default.yaml.

Configuration File Structure

All configuration parameters are defined at the root level of the YAML file. Below is a complete reference of available options.

Random Seed and Reproducibility

seed
integer
default:"7"
Master seed for PyTorch random number generation. Controls model initialization, training randomness, and ensures reproducibility across runs.
dataloader_seed
integer
default:"7"
Seed for the dataloader worker processes. Ensures consistent data shuffling and augmentation across training runs.

System Configuration

num_workers
integer
default:"2"
Number of worker processes for data loading. Adjust based on available CPU cores and I/O capabilities.
memory_bandwidth_gbps
float
default:"12.8"
Memory bandwidth in gigabytes per second. Used for hardware-aware optimization decisions and performance modeling. Typical values:
  • Raspberry Pi 4: 12.8 GB/s
  • NVIDIA Jetson Nano: 25.6 GB/s
  • Mobile devices: 8-16 GB/s
cpu_frequency_scale
float
default:"0.7"
CPU frequency scaling factor (0.0 to 1.0). Simulates throttled CPU performance common in edge devices:
  • 1.0: Full performance
  • 0.7: 70% performance (power saving mode)
  • 0.5: 50% performance (aggressive power saving)
power_watts
float
default:"5.0"
Power consumption in watts for energy proxy calculations. Used to estimate energy consumption during inference:
  • Raspberry Pi 3B: 3-5W
  • Raspberry Pi 4: 5-7W
  • Mobile devices: 2-4W

Dataset and Training

dataset
string
default:"fashion-mnist"
Dataset to use for training and evaluation. Currently supports Fashion-MNIST, a 28x28 grayscale image classification dataset with 10 classes.
batch_size
integer
default:"128"
Batch size for training and inference. Larger batches improve throughput but increase memory usage. For edge devices, keep this value moderate (64-256).
epochs
integer
default:"2"
Number of training epochs. The framework focuses on optimization rather than full training, so this is typically kept small.
learning_rate
float
default:"0.001"
Learning rate for the Adam optimizer during training.
train_subset
integer
default:"12000"
Number of samples to use from the training set. Speeds up experimentation by using a subset of the full dataset.
val_subset
integer
default:"3000"
Number of samples to use from the validation set for accuracy evaluation.

Optimization Parameters

pruning_levels
list
default:"[0.0, 0.25, 0.5, 0.7]"
List of pruning levels to evaluate. Each value represents the fraction of channels to remove:
  • 0.0: No pruning (baseline)
  • 0.25: Remove 25% of channels
  • 0.5: Remove 50% of channels
  • 0.7: Remove 70% of channels (aggressive pruning)
Higher pruning levels reduce model size and latency but may impact accuracy.
precisions
list
default:"[fp32, fp16, int8]"
List of numeric precisions to evaluate:
  • fp32: Full precision (32-bit floating point)
  • fp16: Half precision (16-bit floating point)
  • int8: 8-bit integer quantization
Lower precisions reduce memory footprint and improve inference speed with minimal accuracy loss.
calibration_batches
integer
default:"8"
Number of batches to use for INT8 quantization calibration. The calibration process collects activation statistics to determine optimal quantization parameters.More batches improve quantization accuracy but increase calibration time. Typical range: 8-32 batches.

Memory Budgets

memory_budgets_mb
list
default:"[1.0, 2.0, 4.0]"
List of memory budget thresholds in megabytes. The framework reports which configurations violate each budget:
  • 1.0 MB: Ultra-constrained devices (microcontrollers)
  • 2.0 MB: Constrained edge devices
  • 4.0 MB: Standard edge devices
active_memory_budget_mb
float
default:"2.0"
Primary memory budget constraint in megabytes. Used for filtering and highlighting optimal configurations.

Benchmarking

benchmark_repeats
integer
default:"5"
Number of times to repeat latency measurements for statistical analysis. Higher values provide more reliable statistics but increase benchmarking time.The framework reports mean, standard deviation, and 95th percentile latency across all repeats.

Output

output_dir
string
default:"outputs"
Directory for saving outputs including:
  • Trained model checkpoints
  • Optimized model variants
  • Performance metrics CSV files
  • Pareto frontier visualizations

Example Configuration

seed: 7
dataloader_seed: 7
num_workers: 2
benchmark_repeats: 5
memory_bandwidth_gbps: 12.8
dataset: fashion-mnist
batch_size: 128
epochs: 2
learning_rate: 0.001
train_subset: 12000
val_subset: 3000
power_watts: 5.0
pruning_levels: [0.0, 0.25, 0.5, 0.7]
precisions: [fp32, fp16, int8]
calibration_batches: 8
memory_budgets_mb: [1.0, 2.0, 4.0]
active_memory_budget_mb: 2.0
cpu_frequency_scale: 0.7
output_dir: outputs

Loading Configuration

The framework loads configuration using standard YAML parsing:
import yaml
from pathlib import Path

def load_config(config_path: str) -> dict:
    with open(config_path, 'r') as f:
        config = yaml.safe_load(f)
    return config

# Load default configuration
config = load_config('configs/default.yaml')

# Access parameters
batch_size = config['batch_size']
pruning_levels = config['pruning_levels']
device_power = config['power_watts']

Best Practices

Always set both seed and dataloader_seed to the same value for full reproducibility. This ensures consistent results across multiple runs.
When changing memory_bandwidth_gbps or cpu_frequency_scale, make sure these values match your target hardware specifications. Incorrect values can lead to misleading optimization results.
Low-power devices (< 3W)
  • Use cpu_frequency_scale: 0.5
  • Set aggressive pruning: [0.6, 0.7, 0.8, 0.9]
  • Prefer INT8: precisions: [int8]
  • Lower memory budgets: [0.5, 1.0]
Standard edge devices (3-7W)
  • Use cpu_frequency_scale: 0.7
  • Balanced pruning: [0.0, 0.25, 0.5, 0.7]
  • All precisions: [fp32, fp16, int8]
  • Standard budgets: [1.0, 2.0, 4.0]
High-performance edge (> 7W)
  • Use cpu_frequency_scale: 1.0
  • Conservative pruning: [0.0, 0.25, 0.5]
  • FP32 and FP16: [fp32, fp16]
  • Higher budgets: [4.0, 8.0, 16.0]

Build docs developers (and LLMs) love