Model Conversion & Export
You’ve trained a YOLO model - now it’s time to deploy it to your Raspberry Pi! This lesson covers converting PyTorch models to edge-optimized formats that run faster on resource-constrained devices.Learning Objectives
By the end of this lesson, you will be able to:- Understand different model formats (PyTorch, ONNX, MNN, NCNN)
- Export YOLO models to multiple formats
- Compare format performance characteristics
- Choose the right format for your hardware
- Optimize models for Raspberry Pi deployment
This lesson uses code from
course/vision_class/export/export_model.py and related inference implementations.Why Convert Models?
The Problem with PyTorch
Your trained model is saved as a PyTorch.pt file:
Characteristics:
- Full Python runtime required
- Large dependency footprint (torch, torchvision, etc.)
- Optimized for GPUs, not edge CPUs
- ~20MB model + ~500MB framework
- Slow inference (200-500ms per frame)
- High memory usage (1-2GB)
- Battery drain on mobile robots
PyTorch is excellent for training, but suboptimal for embedded deployment. Specialized formats can achieve 2-5x speedup!
Model Format Options
Format Comparison
| Format | Runtime | Speed | Size | Use Case |
|---|---|---|---|---|
| PyTorch | Full Python | Baseline | Large | Training, development |
| ONNX | ONNX Runtime | 1.5-2x | Medium | Cross-platform deployment |
| MNN | MNN | 2-3x | Small | Mobile/embedded (ARM) |
| NCNN | NCNN | 2-4x | Small | Mobile/embedded (ARM/Vulkan) |
| TensorRT | TensorRT | 3-5x | Medium | NVIDIA GPUs only |
| TFLite | TFLite | 2-3x | Small | Mobile (Android/iOS) |
ONNX (Open Neural Network Exchange)
What is ONNX?- Open standard for ML models
- Hardware and framework agnostic
- Intermediate representation for further optimization
- Wide compatibility (most frameworks can load ONNX)
- Optimization tools available (quantization, pruning)
- Good for cloud deployment
- Deploying to servers or cloud
- Need cross-platform compatibility
- Intermediate step to other formats
MNN (Mobile Neural Network)
What is MNN?- Developed by Alibaba for mobile devices
- Optimized for ARM CPUs (Raspberry Pi’s architecture)
- Lightweight runtime (~1MB)
- Fast on ARM CPUs (2-3x faster than PyTorch)
- Small binary size
- Low memory footprint
- Good for mobile robots
- Raspberry Pi deployment
- Battery-powered robots
- Limited computational resources
NCNN (Neural Network)
What is NCNN?- Developed by Tencent for mobile inference
- Highly optimized for ARM + Vulkan GPU acceleration
- Used in production apps (WeChat, etc.)
- Fastest on ARM devices (2-4x faster than PyTorch)
- Can use Vulkan GPU if available
- Excellent community support
- Best for real-time robotics
- Real-time inference requirements
- Raspberry Pi 4/5 (better ARM cores)
- Need maximum FPS
This course focuses on MNN as seen in MNN provides the best balance of speed, ease of use, and Raspberry Pi compatibility.
inference/model_loader.py:10:Exporting Models with Ultralytics
Basic Export
Fromexport_model.py:1-5:
Supported Export Formats
Export with Options
Important:
imgsz must match your training configuration! If you trained with imgsz=640, export with imgsz=640.Format-Specific Export
Exporting to ONNX
Exporting to MNN
inference/model_loader.py:1-15):
Exporting to NCNN
export/models/ncnn/model_ncnn.py:1-27):
NCNN uses a two-file format:
- .param: Text file describing network architecture
- .bin: Binary file containing weights
Model Optimization Techniques
Half Precision (FP16)
Reduce model size and increase speed by using 16-bit floats instead of 32-bit:- 50% smaller file size
- ~1.5x faster inference on supported hardware
- Minimal accuracy loss (less than 1% mAP drop)
- Have FP16 hardware support (newer ARM, GPUs)
- File size is a constraint
- Need extra speed
Compatibility: Not all formats support FP16. MNN typically doesn’t, ONNX and NCNN do. Check documentation for your target runtime.
Integer Quantization (INT8)
Use 8-bit integers instead of floating point:- 75% smaller file size (vs FP32)
- 2-4x faster on CPUs
- Lower power consumption
- Accuracy loss (1-3% mAP typically)
- Requires calibration dataset
- More complex deployment
- Extreme resource constraints
- Mobile/embedded deployment
- Willing to trade accuracy for speed
Dynamic Shapes vs Fixed
Fixed Input Shape (recommended):- Faster inference (optimized for specific size)
- More optimization opportunities
- Simpler deployment
- Flexibility (can process different image sizes)
- Single model for multiple use cases
Verifying Exported Models
Check Output Consistency
Ensure exported model produces same results as original:- Same number of detections (or ±1)
- Confidence scores within 0.01
- Bounding boxes within 2-3 pixels
Benchmark Inference Speed
Deployment Workflow
Complete Export Pipeline
Directory Structure
Organize exported models:Format Selection Guide
Decision Tree
Recommendations by Use Case
Real-Time Robotics (Raspberry Pi):- Best: MNN or NCNN
- Alternative: ONNX
- Avoid: PyTorch (too slow)
- Best: TFLite or Core ML
- Alternative: NCNN (for cross-platform)
- Best: ONNX or TensorRT (if NVIDIA)
- Alternative: PyTorch (simplicity)
- Best: Format specific to hardware (EdgeTPU, OpenVINO)
Course Project: We use MNN for Raspberry Pi deployment. It provides excellent ARM performance and integrates seamlessly with Ultralytics.
Practice Exercise
Export and Compare Formats
Task: Export your trained model to multiple formats and benchmark them Steps:- Export to ONNX, MNN, and NCNN
- Verify outputs match original model
- Benchmark inference speed on Raspberry Pi
- Measure file sizes
- Document results in a table
- All exports complete without errors
- Detection results match within 2% confidence
- MNN/NCNN show 2x+ speedup over PyTorch
- Can run inference at >5 FPS on Raspberry Pi
Extension: Quantization Experiment
Export with different precision levels:- FP32 (baseline)
- FP16 (half precision)
- INT8 (quantized)
- File size reduction
- Speed improvement
- Accuracy impact (mAP change)
Summary
You’ve learned:- ✓ Why model conversion is necessary for edge deployment
- ✓ Different format options (ONNX, MNN, NCNN) and their tradeoffs
- ✓ How to export YOLO models with Ultralytics
- ✓ Optimization techniques (FP16, INT8)
- ✓ Verification and benchmarking methods
- ✓ Format selection based on deployment target
Next Steps
With optimized models ready, the final lesson covers running real-time inference on Raspberry Pi and integrating with your robot control system.Inference Optimization
Build real-time vision processing pipelines for robotics
Reference Code:
course/vision_class/export/export_model.py:1-5: Basic export exampleexport/models/ncnn/model_ncnn.py:1-27: NCNN inference testinginference/model_loader.py:10: Loading MNN models