Configuration

The Edge AI Hardware Optimization framework uses a YAML configuration file to control all aspects of model training, optimization, and benchmarking. The default configuration is located at configs/default.yaml.

Configuration File Structure

All configuration parameters are defined at the root level of the YAML file. Below is a complete reference of available options.

Random Seed and Reproducibility

seed

integer

default:"7"

Master seed for PyTorch random number generation. Controls model initialization, training randomness, and ensures reproducibility across runs.

dataloader_seed

integer

default:"7"

Seed for the dataloader worker processes. Ensures consistent data shuffling and augmentation across training runs.

System Configuration

num_workers

integer

default:"2"

Number of worker processes for data loading. Adjust based on available CPU cores and I/O capabilities.

memory_bandwidth_gbps

float

default:"12.8"

Memory bandwidth in gigabytes per second. Used for hardware-aware optimization decisions and performance modeling. Typical values:

Raspberry Pi 4: 12.8 GB/s
NVIDIA Jetson Nano: 25.6 GB/s
Mobile devices: 8-16 GB/s

cpu_frequency_scale

float

default:"0.7"

CPU frequency scaling factor (0.0 to 1.0). Simulates throttled CPU performance common in edge devices:

1.0: Full performance
0.7: 70% performance (power saving mode)
0.5: 50% performance (aggressive power saving)

power_watts

float

default:"5.0"

Power consumption in watts for energy proxy calculations. Used to estimate energy consumption during inference:

Raspberry Pi 3B: 3-5W
Raspberry Pi 4: 5-7W
Mobile devices: 2-4W

Dataset and Training

dataset

string

default:"fashion-mnist"

Dataset to use for training and evaluation. Currently supports Fashion-MNIST, a 28x28 grayscale image classification dataset with 10 classes.

batch_size

integer

default:"128"

Batch size for training and inference. Larger batches improve throughput but increase memory usage. For edge devices, keep this value moderate (64-256).

epochs

integer

default:"2"

Number of training epochs. The framework focuses on optimization rather than full training, so this is typically kept small.

learning_rate

float

default:"0.001"

Learning rate for the Adam optimizer during training.

train_subset

integer

default:"12000"

Number of samples to use from the training set. Speeds up experimentation by using a subset of the full dataset.

val_subset

integer

default:"3000"

Number of samples to use from the validation set for accuracy evaluation.

Optimization Parameters

pruning_levels

list

default:"[0.0, 0.25, 0.5, 0.7]"

List of pruning levels to evaluate. Each value represents the fraction of channels to remove:

0.0: No pruning (baseline)
0.25: Remove 25% of channels
0.5: Remove 50% of channels
0.7: Remove 70% of channels (aggressive pruning)

Higher pruning levels reduce model size and latency but may impact accuracy.

precisions

list

default:"[fp32, fp16, int8]"

List of numeric precisions to evaluate:

fp32: Full precision (32-bit floating point)
fp16: Half precision (16-bit floating point)
int8: 8-bit integer quantization

Lower precisions reduce memory footprint and improve inference speed with minimal accuracy loss.

calibration_batches

integer

default:"8"

Number of batches to use for INT8 quantization calibration. The calibration process collects activation statistics to determine optimal quantization parameters.More batches improve quantization accuracy but increase calibration time. Typical range: 8-32 batches.

Memory Budgets

memory_budgets_mb

list

default:"[1.0, 2.0, 4.0]"

List of memory budget thresholds in megabytes. The framework reports which configurations violate each budget:

1.0 MB: Ultra-constrained devices (microcontrollers)
2.0 MB: Constrained edge devices
4.0 MB: Standard edge devices

active_memory_budget_mb

float

default:"2.0"

Primary memory budget constraint in megabytes. Used for filtering and highlighting optimal configurations.

Benchmarking

benchmark_repeats

integer

default:"5"

Number of times to repeat latency measurements for statistical analysis. Higher values provide more reliable statistics but increase benchmarking time.The framework reports mean, standard deviation, and 95th percentile latency across all repeats.

Output

output_dir

string

default:"outputs"

Directory for saving outputs including:

Trained model checkpoints
Optimized model variants
Performance metrics CSV files
Pareto frontier visualizations

Example Configuration

seed: 7
dataloader_seed: 7
num_workers: 2
benchmark_repeats: 5
memory_bandwidth_gbps: 12.8
dataset: fashion-mnist
batch_size: 128
epochs: 2
learning_rate: 0.001
train_subset: 12000
val_subset: 3000
power_watts: 5.0
pruning_levels: [0.0, 0.25, 0.5, 0.7]
precisions: [fp32, fp16, int8]
calibration_batches: 8
memory_budgets_mb: [1.0, 2.0, 4.0]
active_memory_budget_mb: 2.0
cpu_frequency_scale: 0.7
output_dir: outputs

Loading Configuration

The framework loads configuration using standard YAML parsing:

import yaml
from pathlib import Path

def load_config(config_path: str) -> dict:
    with open(config_path, 'r') as f:
        config = yaml.safe_load(f)
    return config

# Load default configuration
config = load_config('configs/default.yaml')

# Access parameters
batch_size = config['batch_size']
pruning_levels = config['pruning_levels']
device_power = config['power_watts']

Best Practices

Always set both seed and dataloader_seed to the same value for full reproducibility. This ensures consistent results across multiple runs.

When changing memory_bandwidth_gbps or cpu_frequency_scale, make sure these values match your target hardware specifications. Incorrect values can lead to misleading optimization results.

Tuning for Different Hardware

Low-power devices (< 3W)

Use cpu_frequency_scale: 0.5
Set aggressive pruning: [0.6, 0.7, 0.8, 0.9]
Prefer INT8: precisions: [int8]
Lower memory budgets: [0.5, 1.0]

Standard edge devices (3-7W)

Use cpu_frequency_scale: 0.7
Balanced pruning: [0.0, 0.25, 0.5, 0.7]
All precisions: [fp32, fp16, int8]
Standard budgets: [1.0, 2.0, 4.0]

High-performance edge (> 7W)

Use cpu_frequency_scale: 1.0
Conservative pruning: [0.0, 0.25, 0.5]
FP32 and FP16: [fp32, fp16]
Higher budgets: [4.0, 8.0, 16.0]

Get Started

Core Concepts

Guides

Hardware Analysis

Deployment

Configuration File Structure

Random Seed and Reproducibility

System Configuration

Dataset and Training

Optimization Parameters

Memory Budgets

Benchmarking

Output

Example Configuration

Loading Configuration

Best Practices

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Hardware Analysis

Deployment

​Configuration File Structure

​Random Seed and Reproducibility

​System Configuration

​Dataset and Training

​Optimization Parameters

​Memory Budgets

​Benchmarking

​Output

​Example Configuration

​Loading Configuration

​Best Practices

Build docs developers (and LLMs) love

Configuration File Structure

Random Seed and Reproducibility

System Configuration

Dataset and Training

Optimization Parameters

Memory Budgets

Benchmarking

Output

Example Configuration

Loading Configuration

Best Practices