Skip to main content

Model Conversion & Export

You’ve trained a YOLO model - now it’s time to deploy it to your Raspberry Pi! This lesson covers converting PyTorch models to edge-optimized formats that run faster on resource-constrained devices.

Learning Objectives

By the end of this lesson, you will be able to:
  • Understand different model formats (PyTorch, ONNX, MNN, NCNN)
  • Export YOLO models to multiple formats
  • Compare format performance characteristics
  • Choose the right format for your hardware
  • Optimize models for Raspberry Pi deployment
This lesson uses code from course/vision_class/export/export_model.py and related inference implementations.

Why Convert Models?

The Problem with PyTorch

Your trained model is saved as a PyTorch .pt file: Characteristics:
  • Full Python runtime required
  • Large dependency footprint (torch, torchvision, etc.)
  • Optimized for GPUs, not edge CPUs
  • ~20MB model + ~500MB framework
On Raspberry Pi:
  • Slow inference (200-500ms per frame)
  • High memory usage (1-2GB)
  • Battery drain on mobile robots
PyTorch is excellent for training, but suboptimal for embedded deployment. Specialized formats can achieve 2-5x speedup!

Model Format Options

Format Comparison

FormatRuntimeSpeedSizeUse Case
PyTorchFull PythonBaselineLargeTraining, development
ONNXONNX Runtime1.5-2xMediumCross-platform deployment
MNNMNN2-3xSmallMobile/embedded (ARM)
NCNNNCNN2-4xSmallMobile/embedded (ARM/Vulkan)
TensorRTTensorRT3-5xMediumNVIDIA GPUs only
TFLiteTFLite2-3xSmallMobile (Android/iOS)
For Raspberry Pi: NCNN and MNN are optimal choices. Both are designed for ARM CPUs and provide significant speedups over PyTorch.

ONNX (Open Neural Network Exchange)

What is ONNX?
  • Open standard for ML models
  • Hardware and framework agnostic
  • Intermediate representation for further optimization
Advantages:
  • Wide compatibility (most frameworks can load ONNX)
  • Optimization tools available (quantization, pruning)
  • Good for cloud deployment
When to Use:
  • Deploying to servers or cloud
  • Need cross-platform compatibility
  • Intermediate step to other formats

MNN (Mobile Neural Network)

What is MNN?
  • Developed by Alibaba for mobile devices
  • Optimized for ARM CPUs (Raspberry Pi’s architecture)
  • Lightweight runtime (~1MB)
Advantages:
  • Fast on ARM CPUs (2-3x faster than PyTorch)
  • Small binary size
  • Low memory footprint
  • Good for mobile robots
When to Use:
  • Raspberry Pi deployment
  • Battery-powered robots
  • Limited computational resources

NCNN (Neural Network)

What is NCNN?
  • Developed by Tencent for mobile inference
  • Highly optimized for ARM + Vulkan GPU acceleration
  • Used in production apps (WeChat, etc.)
Advantages:
  • Fastest on ARM devices (2-4x faster than PyTorch)
  • Can use Vulkan GPU if available
  • Excellent community support
  • Best for real-time robotics
When to Use:
  • Real-time inference requirements
  • Raspberry Pi 4/5 (better ARM cores)
  • Need maximum FPS
This course focuses on MNN as seen in inference/model_loader.py:10:
object_model_path: str = current_path + '/models/mnn/yolo11s.mnn'
MNN provides the best balance of speed, ease of use, and Raspberry Pi compatibility.

Exporting Models with Ultralytics

Basic Export

From export_model.py:1-5:
from ultralytics import YOLO

# Load trained model
model = YOLO('models/torch/yolo11s.pt')

# Export to MNN format
model.export(format="mnn")
That’s it! Ultralytics handles all the conversion complexity. Output:
Export complete (15.2s)
Results saved to models/torch/yolo11s.mnn

Supported Export Formats

from ultralytics import YOLO

model = YOLO('yolo11s.pt')

# Export to different formats
model.export(format="onnx")       # ONNX
model.export(format="mnn")        # MNN
model.export(format="ncnn")       # NCNN
model.export(format="tflite")     # TensorFlow Lite
model.export(format="edgetpu")    # Google Coral Edge TPU
model.export(format="coreml")     # Apple Core ML
model.export(format="torchscript")# TorchScript
model.export(format="engine")     # TensorRT

Export with Options

from ultralytics import YOLO

model = YOLO('yolo11s.pt')

# Export MNN with options
model.export(
    format="mnn",
    imgsz=640,           # Input size (must match training)
    half=False,          # FP16 quantization (not supported by MNN)
    int8=False,          # INT8 quantization
    dynamic=False,       # Dynamic input shapes
    simplify=True,       # Simplify ONNX graph
    opset=12,            # ONNX opset version
)
Important: imgsz must match your training configuration! If you trained with imgsz=640, export with imgsz=640.

Format-Specific Export

Exporting to ONNX

model = YOLO('yolo11s.pt')

model.export(
    format="onnx",
    imgsz=640,
    dynamic=False,      # Fixed input size (faster)
    simplify=True,      # Simplify graph for optimization
    opset=12            # ONNX operator set version
)
Output Files:
yolo11s.onnx          # ONNX model
Testing ONNX:
import onnxruntime as ort
import numpy as np

# Load ONNX model
session = ort.InferenceSession('yolo11s.onnx')

# Prepare input
input_name = session.get_inputs()[0].name
image = np.random.randn(1, 3, 640, 640).astype(np.float32)

# Run inference
outputs = session.run(None, {input_name: image})
print(f'Output shape: {outputs[0].shape}')

Exporting to MNN

model = YOLO('yolo11s.pt')

model.export(
    format="mnn",
    imgsz=640
)
Output Files:
yolo11s.mnn           # MNN model
Using MNN Model (from inference/model_loader.py:1-15):
import os
from ultralytics import YOLO

class ModelLoader:
    def __init__(self):
        current_path = os.path.dirname(os.path.abspath(__file__))
        object_model_path: str = current_path + '/models/mnn/yolo11s.mnn'
        
        # Ultralytics can load MNN models directly!
        self.model: YOLO = YOLO(object_model_path, task='detect')
        
    def get_model(self) -> YOLO:
        return self.model
Ultralytics makes it easy: You can load MNN models with the same YOLO() class used for PyTorch models. The API is identical!

Exporting to NCNN

model = YOLO('yolo11s.pt')

model.export(
    format="ncnn",
    imgsz=640,
    half=True  # FP16 for faster inference
)
Output Files:
yolo11s_ncnn_model/
├── model.ncnn.param    # Network structure
└── model.ncnn.bin      # Weights
Testing NCNN (from export/models/ncnn/model_ncnn.py:1-27):
import numpy as np
import ncnn
import torch

def test_inference():
    # Create test input
    torch.manual_seed(0)
    in0 = torch.rand(1, 3, 640, 640, dtype=torch.float)
    out = []

    # Load NCNN model
    with ncnn.Net() as net:
        net.load_param("models/torch/yolo11s_ncnn_model/model.ncnn.param")
        net.load_model("models/torch/yolo11s_ncnn_model/model.ncnn.bin")

        # Run inference
        with net.create_extractor() as ex:
            ex.input("in0", ncnn.Mat(in0.squeeze(0).numpy()).clone())
            
            _, out0 = ex.extract("out0")
            out.append(torch.from_numpy(np.array(out0)).unsqueeze(0))

    if len(out) == 1:
        return out[0]
    else:
        return tuple(out)

if __name__ == "__main__":
    print(test_inference())
NCNN uses a two-file format:
  • .param: Text file describing network architecture
  • .bin: Binary file containing weights
Both files must be present for inference.

Model Optimization Techniques

Half Precision (FP16)

Reduce model size and increase speed by using 16-bit floats instead of 32-bit:
model.export(
    format="onnx",
    half=True  # Use FP16
)
Benefits:
  • 50% smaller file size
  • ~1.5x faster inference on supported hardware
  • Minimal accuracy loss (less than 1% mAP drop)
When to Use:
  • Have FP16 hardware support (newer ARM, GPUs)
  • File size is a constraint
  • Need extra speed
Compatibility: Not all formats support FP16. MNN typically doesn’t, ONNX and NCNN do. Check documentation for your target runtime.

Integer Quantization (INT8)

Use 8-bit integers instead of floating point:
model.export(
    format="tflite",
    int8=True  # INT8 quantization
)
Benefits:
  • 75% smaller file size (vs FP32)
  • 2-4x faster on CPUs
  • Lower power consumption
Drawbacks:
  • Accuracy loss (1-3% mAP typically)
  • Requires calibration dataset
  • More complex deployment
When to Use:
  • Extreme resource constraints
  • Mobile/embedded deployment
  • Willing to trade accuracy for speed

Dynamic Shapes vs Fixed

Fixed Input Shape (recommended):
model.export(
    format="onnx",
    imgsz=640,
    dynamic=False  # Fixed 640x640 input
)
Benefits:
  • Faster inference (optimized for specific size)
  • More optimization opportunities
  • Simpler deployment
Dynamic Input Shape:
model.export(
    format="onnx",
    dynamic=True  # Any input size
)
Benefits:
  • Flexibility (can process different image sizes)
  • Single model for multiple use cases
For robotics with fixed camera resolution, use fixed shapes for maximum performance.

Verifying Exported Models

Check Output Consistency

Ensure exported model produces same results as original:
import numpy as np
from ultralytics import YOLO

# Load both models
original = YOLO('yolo11s.pt')
exported = YOLO('yolo11s.mnn')

# Test image
test_image = 'test.jpg'

# Run inference
original_results = original.predict(test_image, conf=0.5)
exported_results = exported.predict(test_image, conf=0.5)

# Compare detections
print(f'Original detections: {len(original_results[0].boxes)}')
print(f'Exported detections: {len(exported_results[0].boxes)}')

# Check boxes
for orig_box, exp_box in zip(original_results[0].boxes, exported_results[0].boxes):
    orig_conf = float(orig_box.conf)
    exp_conf = float(exp_box.conf)
    conf_diff = abs(orig_conf - exp_conf)
    print(f'Confidence diff: {conf_diff:.4f}')
    
    if conf_diff > 0.01:
        print('Warning: Significant confidence difference!')
Expected:
  • Same number of detections (or ±1)
  • Confidence scores within 0.01
  • Bounding boxes within 2-3 pixels

Benchmark Inference Speed

import time
import cv2
from ultralytics import YOLO

def benchmark_model(model_path, num_runs=100):
    model = YOLO(model_path)
    image = cv2.imread('test.jpg')
    
    # Warmup
    for _ in range(10):
        model.predict(image, verbose=False)
    
    # Benchmark
    start = time.time()
    for _ in range(num_runs):
        results = model.predict(image, verbose=False)
    elapsed = time.time() - start
    
    avg_time = elapsed / num_runs
    fps = 1.0 / avg_time
    
    print(f'{model_path}:')
    print(f'  Average time: {avg_time*1000:.1f}ms')
    print(f'  FPS: {fps:.1f}')
    return avg_time

# Compare formats
print('Benchmarking models...')
pt_time = benchmark_model('yolo11s.pt')
mnn_time = benchmark_model('yolo11s.mnn')

speedup = pt_time / mnn_time
print(f'\nSpeedup: {speedup:.2f}x')
Expected Results (Raspberry Pi 4):
yolo11s.pt:
  Average time: 312.4ms
  FPS: 3.2

yolo11s.mnn:
  Average time: 128.6ms
  FPS: 7.8

Speedup: 2.43x

Deployment Workflow

Complete Export Pipeline

from ultralytics import YOLO
import os

def export_all_formats(model_path, output_dir):
    """
    Export model to all supported formats for testing
    """
    model = YOLO(model_path)
    
    formats = {
        'onnx': {'half': False, 'simplify': True},
        'mnn': {},
        'ncnn': {'half': True},
    }
    
    for fmt, kwargs in formats.items():
        print(f'Exporting to {fmt.upper()}...')
        try:
            model.export(format=fmt, **kwargs)
            print(f'  Success!')
        except Exception as e:
            print(f'  Failed: {e}')
    
    print('\nExport complete!')

if __name__ == '__main__':
    export_all_formats(
        model_path='models/torch/yolo11s.pt',
        output_dir='models/exported/'
    )

Directory Structure

Organize exported models:
models/
├── torch/
│   └── yolo11s.pt          # Original trained model
├── onnx/
│   └── yolo11s.onnx        # ONNX export
├── mnn/
│   └── yolo11s.mnn         # MNN export (for Raspberry Pi)
└── ncnn/
    └── yolo11s_ncnn_model/ # NCNN export
        ├── model.ncnn.param
        └── model.ncnn.bin

Format Selection Guide

Decision Tree

Recommendations by Use Case

Real-Time Robotics (Raspberry Pi):
  • Best: MNN or NCNN
  • Alternative: ONNX
  • Avoid: PyTorch (too slow)
Mobile App Development:
  • Best: TFLite or Core ML
  • Alternative: NCNN (for cross-platform)
Cloud Inference:
  • Best: ONNX or TensorRT (if NVIDIA)
  • Alternative: PyTorch (simplicity)
Edge Device (Coral TPU, Intel NCS):
  • Best: Format specific to hardware (EdgeTPU, OpenVINO)
Course Project: We use MNN for Raspberry Pi deployment. It provides excellent ARM performance and integrates seamlessly with Ultralytics.

Practice Exercise

Export and Compare Formats

Task: Export your trained model to multiple formats and benchmark them Steps:
  1. Export to ONNX, MNN, and NCNN
  2. Verify outputs match original model
  3. Benchmark inference speed on Raspberry Pi
  4. Measure file sizes
  5. Document results in a table
Success Criteria:
  • All exports complete without errors
  • Detection results match within 2% confidence
  • MNN/NCNN show 2x+ speedup over PyTorch
  • Can run inference at >5 FPS on Raspberry Pi

Extension: Quantization Experiment

Export with different precision levels:
  • FP32 (baseline)
  • FP16 (half precision)
  • INT8 (quantized)
Measure:
  • File size reduction
  • Speed improvement
  • Accuracy impact (mAP change)

Summary

You’ve learned:
  • ✓ Why model conversion is necessary for edge deployment
  • ✓ Different format options (ONNX, MNN, NCNN) and their tradeoffs
  • ✓ How to export YOLO models with Ultralytics
  • ✓ Optimization techniques (FP16, INT8)
  • ✓ Verification and benchmarking methods
  • ✓ Format selection based on deployment target

Next Steps

With optimized models ready, the final lesson covers running real-time inference on Raspberry Pi and integrating with your robot control system.

Inference Optimization

Build real-time vision processing pipelines for robotics
Reference Code: course/vision_class/
  • export/export_model.py:1-5: Basic export example
  • export/models/ncnn/model_ncnn.py:1-27: NCNN inference testing
  • inference/model_loader.py:10: Loading MNN models

Build docs developers (and LLMs) love