Overview
The Edge AI Hardware Optimization framework implements a constraint-first optimization pipeline designed for evaluating compact CNN deployments under edge-device constraints. The architecture prioritizes deterministic execution, measurable trade-offs, and low-complexity implementation suitable for iterative experimentation.Pipeline Stages
The optimization pipeline consists of seven sequential stages that transform a baseline model into deployment-ready candidates:Configuration Load
The The configuration loader supports scalar parsing with automatic type inference for integers, floats, booleans, lists, and None values.
edge_opt.config module parses YAML configuration into a typed ExperimentConfig dataclass.Dataset and Loader Setup
The
edge_opt.data module builds deterministic train/validation loaders with reproducible shuffling.Supported datasets:
mnist and fashion-mnist. Loaders use fixed Generator seeds to ensure reproducible batch ordering across runs.Baseline Training
The The
edge_opt.experiments.train_model function trains the compact CNN using Adam optimizer and cross-entropy loss.SmallCNN architecture consists of:- Conv1: 1 → 16 channels (3x3 kernel, padding=1)
- MaxPool + ReLU
- Conv2: 16 → 32 channels (3x3 kernel, padding=1)
- MaxPool + ReLU
- Flatten + Linear classifier: 32×7×7 → 10 classes
Optimization Sweep
The Sweep cardinality scales as
edge_opt.experiments.run_sweep function applies pruning and precision variants across the configuration space.len(pruning_levels) × len(precisions). Each candidate is evaluated independently.Metric Collection
The
edge_opt.metrics module computes comprehensive performance metrics for each candidate.Collected metrics:- Accuracy: Validation set classification accuracy
- Latency: Mean, standard deviation, and P95 inference time (ms)
- Throughput: Samples per second
- Memory: Model footprint from state dict (MB)
- Energy Proxy:
latency_seconds × power_watts(J)
Constraint Filtering
Candidates are classified by the active memory budget before Pareto frontier generation.This constraint-first filtering ensures infeasible candidates do not distort operating-point selection.
Reporting
The pipeline generates:
- Sweep tables:
sweep_results.csv - Pareto frontiers:
pareto_frontier_latency.csv,pareto_frontier_energy.csv - Summary JSON:
summary.json - Hardware analysis:
layerwise_breakdown.csv,precision_tradeoffs.csv,hardware_summary.csv - Visualizations: accuracy vs latency/energy/memory plots, layer-wise activation memory and MACs
Design Decisions
Compact CNN Architecture
Compact CNN Architecture
A fixed network topology isolates pruning and precision effects from architecture search noise. The compact CNN is used to keep iteration cycle times short while retaining realistic convolutional operator behavior.Trade-off: Simplicity vs representational capacity. The small architecture enables rapid experimentation but may not capture all real-world deployment complexities.
Structured Channel Pruning
Structured Channel Pruning
Structured pruning removes whole channels to preserve dense kernels and straightforward deployment compatibility. Unlike unstructured pruning, this approach:
- Maintains dense tensor operations (no sparse kernel support needed)
- Reduces actual runtime memory and compute (not just parameter count)
- Simplifies hardware deployment (no specialized sparse accelerators required)
edge_opt.pruning.structured_channel_prune in Model Optimization for implementation details.Explicit Precision Conversion
Explicit Precision Conversion
Precision modes (
fp32, fp16, int8) are explicit to keep evaluation paths auditable:- FP32: Baseline floating-point (no conversion)
- FP16: Half-precision using
.half()conversion - INT8: Post-training static quantization with
fbgemmbackend
Constraint-First Filtering
Constraint-First Filtering
Memory budget checks run before Pareto frontier analysis. This design ensures:
- Infeasible candidates are explicitly marked as rejected
- Pareto frontiers only include deployable configurations
- Operating-point selection respects hard constraints
active_memory_budget_mb parameter acts as the hard acceptance threshold, while memory_budgets_mb provides additional violation flags for reporting.Pareto Frontier Generation
Pareto Frontier Generation
Pareto frontiers are computed after constraint filtering to avoid infeasible configurations:Separate frontiers are generated for latency-accuracy and energy-accuracy trade-offs.
Operational Constraints
CPU Execution Only
The default pipeline runs CPU-only execution to reflect common edge integration constraints where accelerator access may be limited.Rationale: Many edge devices lack GPU or specialized accelerators. CPU-focused benchmarking provides realistic baseline estimates.
No Distributed Training
Single-node training only. No multi-GPU or distributed training support.Rationale: Edge deployment targets are typically single-device inference scenarios. Training infrastructure is simplified to match.
FBGEMM Quantization Backend
INT8 quantization defaults to PyTorch’s
fbgemm backend for x86 CPU targets.Rationale: FBGEMM provides optimized INT8 kernels for server and edge x86 processors. Alternative backends (e.g., QNNPACK for ARM) require configuration changes.Config-Driven Workflow
Most experiment knobs are externalized in YAML to support repeatable benchmark sweeps.Rationale: Configuration files enable version control, reproducibility, and systematic hyperparameter exploration without code changes.
Deployment Challenges
Module Reference
The pipeline architecture is implemented across these core modules:| Module | Location | Responsibility |
|---|---|---|
| Config | src/edge_opt/config.py | YAML parsing, configuration validation |
| Data | src/edge_opt/data.py | Dataset loading, deterministic loaders |
| Model | src/edge_opt/model.py | SmallCNN architecture, deterministic seeding |
| Pruning | src/edge_opt/pruning.py | Structured channel pruning |
| Quantization | src/edge_opt/quantization.py | FP16 and INT8 conversion |
| Metrics | src/edge_opt/metrics.py | Performance measurement, constraint checking |
| Experiments | src/edge_opt/experiments.py | Training, sweep orchestration, Pareto frontiers |
| Hardware | src/edge_opt/hardware.py | Layer-wise analysis, bandwidth utilization |
Recommended Extensions
Multi-Seed Orchestration: Add multi-seed experiment aggregation and confidence intervals to improve statistical rigor.
Hardware Counters: Integrate performance monitoring unit (PMU) counters for cache, bandwidth, and instruction-level profiling to replace software-level estimates.
Artifact Manifests: Introduce model checksum and dataset version metadata to ensure full reproducibility and artifact traceability.
Next Steps
Model Optimization
Learn about pruning and quantization techniques
Hardware Constraints
Understand memory budgets and performance modeling