Model Conversion & Export

You’ve trained a YOLO model - now it’s time to deploy it to your Raspberry Pi! This lesson covers converting PyTorch models to edge-optimized formats that run faster on resource-constrained devices.

Learning Objectives

By the end of this lesson, you will be able to:

Understand different model formats (PyTorch, ONNX, MNN, NCNN)
Export YOLO models to multiple formats
Compare format performance characteristics
Choose the right format for your hardware
Optimize models for Raspberry Pi deployment

This lesson uses code from course/vision_class/export/export_model.py and related inference implementations.

Why Convert Models?

The Problem with PyTorch

Your trained model is saved as a PyTorch .pt file: Characteristics:

Full Python runtime required
Large dependency footprint (torch, torchvision, etc.)
Optimized for GPUs, not edge CPUs
~20MB model + ~500MB framework

On Raspberry Pi:

Slow inference (200-500ms per frame)
High memory usage (1-2GB)
Battery drain on mobile robots

PyTorch is excellent for training, but suboptimal for embedded deployment. Specialized formats can achieve 2-5x speedup!

Model Format Options

Format Comparison

Format	Runtime	Speed	Size	Use Case
PyTorch	Full Python	Baseline	Large	Training, development
ONNX	ONNX Runtime	1.5-2x	Medium	Cross-platform deployment
MNN	MNN	2-3x	Small	Mobile/embedded (ARM)
NCNN	NCNN	2-4x	Small	Mobile/embedded (ARM/Vulkan)
TensorRT	TensorRT	3-5x	Medium	NVIDIA GPUs only
TFLite	TFLite	2-3x	Small	Mobile (Android/iOS)

For Raspberry Pi: NCNN and MNN are optimal choices. Both are designed for ARM CPUs and provide significant speedups over PyTorch.

ONNX (Open Neural Network Exchange)

What is ONNX?

Open standard for ML models
Hardware and framework agnostic
Intermediate representation for further optimization

Advantages:

Wide compatibility (most frameworks can load ONNX)
Optimization tools available (quantization, pruning)
Good for cloud deployment

When to Use:

Deploying to servers or cloud
Need cross-platform compatibility
Intermediate step to other formats

MNN (Mobile Neural Network)

What is MNN?

Developed by Alibaba for mobile devices
Optimized for ARM CPUs (Raspberry Pi’s architecture)
Lightweight runtime (~1MB)

Advantages:

Fast on ARM CPUs (2-3x faster than PyTorch)
Small binary size
Low memory footprint
Good for mobile robots

When to Use:

Raspberry Pi deployment
Battery-powered robots
Limited computational resources

NCNN (Neural Network)

What is NCNN?

Developed by Tencent for mobile inference
Highly optimized for ARM + Vulkan GPU acceleration
Used in production apps (WeChat, etc.)

Advantages:

Fastest on ARM devices (2-4x faster than PyTorch)
Can use Vulkan GPU if available
Excellent community support
Best for real-time robotics

When to Use:

Real-time inference requirements
Raspberry Pi 4/5 (better ARM cores)
Need maximum FPS

This course focuses on MNN as seen in inference/model_loader.py:10:

object_model_path: str = current_path + '/models/mnn/yolo11s.mnn'

MNN provides the best balance of speed, ease of use, and Raspberry Pi compatibility.

Exporting Models with Ultralytics

Basic Export

From export_model.py:1-5:

from ultralytics import YOLO

# Load trained model
model = YOLO('models/torch/yolo11s.pt')

# Export to MNN format
model.export(format="mnn")

That’s it! Ultralytics handles all the conversion complexity. Output:

Export complete (15.2s)
Results saved to models/torch/yolo11s.mnn

Supported Export Formats

from ultralytics import YOLO

model = YOLO('yolo11s.pt')

# Export to different formats
model.export(format="onnx")       # ONNX
model.export(format="mnn")        # MNN
model.export(format="ncnn")       # NCNN
model.export(format="tflite")     # TensorFlow Lite
model.export(format="edgetpu")    # Google Coral Edge TPU
model.export(format="coreml")     # Apple Core ML
model.export(format="torchscript")# TorchScript
model.export(format="engine")     # TensorRT

Export with Options

from ultralytics import YOLO

model = YOLO('yolo11s.pt')

# Export MNN with options
model.export(
    format="mnn",
    imgsz=640,           # Input size (must match training)
    half=False,          # FP16 quantization (not supported by MNN)
    int8=False,          # INT8 quantization
    dynamic=False,       # Dynamic input shapes
    simplify=True,       # Simplify ONNX graph
    opset=12,            # ONNX opset version
)

Important: imgsz must match your training configuration! If you trained with imgsz=640, export with imgsz=640.

Format-Specific Export

Exporting to ONNX

model = YOLO('yolo11s.pt')

model.export(
    format="onnx",
    imgsz=640,
    dynamic=False,      # Fixed input size (faster)
    simplify=True,      # Simplify graph for optimization
    opset=12            # ONNX operator set version
)

Output Files:

yolo11s.onnx          # ONNX model

Testing ONNX:

import onnxruntime as ort
import numpy as np

# Load ONNX model
session = ort.InferenceSession('yolo11s.onnx')

# Prepare input
input_name = session.get_inputs()[0].name
image = np.random.randn(1, 3, 640, 640).astype(np.float32)

# Run inference
outputs = session.run(None, {input_name: image})
print(f'Output shape: {outputs[0].shape}')

Exporting to MNN

model = YOLO('yolo11s.pt')

model.export(
    format="mnn",
    imgsz=640
)

Output Files:

yolo11s.mnn           # MNN model

Using MNN Model (from inference/model_loader.py:1-15):

import os
from ultralytics import YOLO

class ModelLoader:
    def __init__(self):
        current_path = os.path.dirname(os.path.abspath(__file__))
        object_model_path: str = current_path + '/models/mnn/yolo11s.mnn'
        
        # Ultralytics can load MNN models directly!
        self.model: YOLO = YOLO(object_model_path, task='detect')
        
    def get_model(self) -> YOLO:
        return self.model

Ultralytics makes it easy: You can load MNN models with the same YOLO() class used for PyTorch models. The API is identical!

Exporting to NCNN

model = YOLO('yolo11s.pt')

model.export(
    format="ncnn",
    imgsz=640,
    half=True  # FP16 for faster inference
)

Output Files:

yolo11s_ncnn_model/
├── model.ncnn.param    # Network structure
└── model.ncnn.bin      # Weights

Testing NCNN (from export/models/ncnn/model_ncnn.py:1-27):

import numpy as np
import ncnn
import torch

def test_inference():
    # Create test input
    torch.manual_seed(0)
    in0 = torch.rand(1, 3, 640, 640, dtype=torch.float)
    out = []

    # Load NCNN model
    with ncnn.Net() as net:
        net.load_param("models/torch/yolo11s_ncnn_model/model.ncnn.param")
        net.load_model("models/torch/yolo11s_ncnn_model/model.ncnn.bin")

        # Run inference
        with net.create_extractor() as ex:
            ex.input("in0", ncnn.Mat(in0.squeeze(0).numpy()).clone())
            
            _, out0 = ex.extract("out0")
            out.append(torch.from_numpy(np.array(out0)).unsqueeze(0))

    if len(out) == 1:
        return out[0]
    else:
        return tuple(out)

if __name__ == "__main__":
    print(test_inference())

NCNN uses a two-file format:

.param: Text file describing network architecture
.bin: Binary file containing weights

Both files must be present for inference.

Model Optimization Techniques

Half Precision (FP16)

Reduce model size and increase speed by using 16-bit floats instead of 32-bit:

model.export(
    format="onnx",
    half=True  # Use FP16
)

Benefits:

50% smaller file size
~1.5x faster inference on supported hardware
Minimal accuracy loss (less than 1% mAP drop)

When to Use:

Have FP16 hardware support (newer ARM, GPUs)
File size is a constraint
Need extra speed

Compatibility: Not all formats support FP16. MNN typically doesn’t, ONNX and NCNN do. Check documentation for your target runtime.

Integer Quantization (INT8)

Use 8-bit integers instead of floating point:

model.export(
    format="tflite",
    int8=True  # INT8 quantization
)

Benefits:

75% smaller file size (vs FP32)
2-4x faster on CPUs
Lower power consumption

Drawbacks:

Accuracy loss (1-3% mAP typically)
Requires calibration dataset
More complex deployment

When to Use:

Extreme resource constraints
Mobile/embedded deployment
Willing to trade accuracy for speed

Dynamic Shapes vs Fixed

Fixed Input Shape (recommended):

model.export(
    format="onnx",
    imgsz=640,
    dynamic=False  # Fixed 640x640 input
)

Benefits:

Faster inference (optimized for specific size)
More optimization opportunities
Simpler deployment

Dynamic Input Shape:

model.export(
    format="onnx",
    dynamic=True  # Any input size
)

Benefits:

Flexibility (can process different image sizes)
Single model for multiple use cases

For robotics with fixed camera resolution, use fixed shapes for maximum performance.

Verifying Exported Models

Check Output Consistency

Ensure exported model produces same results as original:

import numpy as np
from ultralytics import YOLO

# Load both models
original = YOLO('yolo11s.pt')
exported = YOLO('yolo11s.mnn')

# Test image
test_image = 'test.jpg'

# Run inference
original_results = original.predict(test_image, conf=0.5)
exported_results = exported.predict(test_image, conf=0.5)

# Compare detections
print(f'Original detections: {len(original_results[0].boxes)}')
print(f'Exported detections: {len(exported_results[0].boxes)}')

# Check boxes
for orig_box, exp_box in zip(original_results[0].boxes, exported_results[0].boxes):
    orig_conf = float(orig_box.conf)
    exp_conf = float(exp_box.conf)
    conf_diff = abs(orig_conf - exp_conf)
    print(f'Confidence diff: {conf_diff:.4f}')
    
    if conf_diff > 0.01:
        print('Warning: Significant confidence difference!')

Expected:

Same number of detections (or ±1)
Confidence scores within 0.01
Bounding boxes within 2-3 pixels

Benchmark Inference Speed

import time
import cv2
from ultralytics import YOLO

def benchmark_model(model_path, num_runs=100):
    model = YOLO(model_path)
    image = cv2.imread('test.jpg')
    
    # Warmup
    for _ in range(10):
        model.predict(image, verbose=False)
    
    # Benchmark
    start = time.time()
    for _ in range(num_runs):
        results = model.predict(image, verbose=False)
    elapsed = time.time() - start
    
    avg_time = elapsed / num_runs
    fps = 1.0 / avg_time
    
    print(f'{model_path}:')
    print(f'  Average time: {avg_time*1000:.1f}ms')
    print(f'  FPS: {fps:.1f}')
    return avg_time

# Compare formats
print('Benchmarking models...')
pt_time = benchmark_model('yolo11s.pt')
mnn_time = benchmark_model('yolo11s.mnn')

speedup = pt_time / mnn_time
print(f'\nSpeedup: {speedup:.2f}x')

Expected Results (Raspberry Pi 4):

yolo11s.pt:
  Average time: 312.4ms
  FPS: 3.2

yolo11s.mnn:
  Average time: 128.6ms
  FPS: 7.8

Speedup: 2.43x

Deployment Workflow

Complete Export Pipeline

from ultralytics import YOLO
import os

def export_all_formats(model_path, output_dir):
    """
    Export model to all supported formats for testing
    """
    model = YOLO(model_path)
    
    formats = {
        'onnx': {'half': False, 'simplify': True},
        'mnn': {},
        'ncnn': {'half': True},
    }
    
    for fmt, kwargs in formats.items():
        print(f'Exporting to {fmt.upper()}...')
        try:
            model.export(format=fmt, **kwargs)
            print(f'  Success!')
        except Exception as e:
            print(f'  Failed: {e}')
    
    print('\nExport complete!')

if __name__ == '__main__':
    export_all_formats(
        model_path='models/torch/yolo11s.pt',
        output_dir='models/exported/'
    )

Directory Structure

Organize exported models:

models/
├── torch/
│   └── yolo11s.pt          # Original trained model
├── onnx/
│   └── yolo11s.onnx        # ONNX export
├── mnn/
│   └── yolo11s.mnn         # MNN export (for Raspberry Pi)
└── ncnn/
    └── yolo11s_ncnn_model/ # NCNN export
        ├── model.ncnn.param
        └── model.ncnn.bin

Format Selection Guide

Decision Tree

Recommendations by Use Case

Real-Time Robotics (Raspberry Pi):

Best: MNN or NCNN
Alternative: ONNX
Avoid: PyTorch (too slow)

Mobile App Development:

Best: TFLite or Core ML
Alternative: NCNN (for cross-platform)

Cloud Inference:

Best: ONNX or TensorRT (if NVIDIA)
Alternative: PyTorch (simplicity)

Edge Device (Coral TPU, Intel NCS):

Best: Format specific to hardware (EdgeTPU, OpenVINO)

Course Project: We use MNN for Raspberry Pi deployment. It provides excellent ARM performance and integrates seamlessly with Ultralytics.

Practice Exercise

Export and Compare Formats

Task: Export your trained model to multiple formats and benchmark them Steps:

Export to ONNX, MNN, and NCNN
Verify outputs match original model
Benchmark inference speed on Raspberry Pi
Measure file sizes
Document results in a table

Success Criteria:

All exports complete without errors
Detection results match within 2% confidence
MNN/NCNN show 2x+ speedup over PyTorch
Can run inference at >5 FPS on Raspberry Pi

Extension: Quantization Experiment

Export with different precision levels:

FP32 (baseline)
FP16 (half precision)
INT8 (quantized)

Measure:

File size reduction
Speed improvement
Accuracy impact (mAP change)

Summary

You’ve learned:

✓ Why model conversion is necessary for edge deployment
✓ Different format options (ONNX, MNN, NCNN) and their tradeoffs
✓ How to export YOLO models with Ultralytics
✓ Optimization techniques (FP16, INT8)
✓ Verification and benchmarking methods
✓ Format selection based on deployment target

Next Steps

With optimized models ready, the final lesson covers running real-time inference on Raspberry Pi and integrating with your robot control system.

Inference Optimization

Build real-time vision processing pipelines for robotics

Reference Code: course/vision_class/

export/export_model.py:1-5: Basic export example
export/models/ncnn/model_ncnn.py:1-27: NCNN inference testing
inference/model_loader.py:10: Loading MNN models

Learning Path

Communication Class

Vision Class

​Model Conversion & Export

​Learning Objectives

​Why Convert Models?

​The Problem with PyTorch

​Model Format Options

​Format Comparison

​ONNX (Open Neural Network Exchange)

​MNN (Mobile Neural Network)

​NCNN (Neural Network)

​Exporting Models with Ultralytics

​Basic Export

​Supported Export Formats

​Export with Options

​Format-Specific Export

​Exporting to ONNX

​Exporting to MNN

​Exporting to NCNN

​Model Optimization Techniques

​Half Precision (FP16)

​Integer Quantization (INT8)

​Dynamic Shapes vs Fixed

​Verifying Exported Models

​Check Output Consistency

​Benchmark Inference Speed

​Deployment Workflow

​Complete Export Pipeline

​Directory Structure

​Format Selection Guide

​Decision Tree

​Recommendations by Use Case

​Practice Exercise

​Export and Compare Formats

​Extension: Quantization Experiment

​Summary

​Next Steps

Inference Optimization

Build docs developers (and LLMs) love

Model Conversion & Export

Learning Objectives

Why Convert Models?

The Problem with PyTorch

Model Format Options

Format Comparison

ONNX (Open Neural Network Exchange)

MNN (Mobile Neural Network)

NCNN (Neural Network)

Exporting Models with Ultralytics

Basic Export

Supported Export Formats

Export with Options

Format-Specific Export

Exporting to ONNX

Exporting to MNN

Exporting to NCNN

Model Optimization Techniques

Half Precision (FP16)

Integer Quantization (INT8)

Dynamic Shapes vs Fixed

Verifying Exported Models

Check Output Consistency

Benchmark Inference Speed

Deployment Workflow

Complete Export Pipeline

Directory Structure

Format Selection Guide

Decision Tree

Recommendations by Use Case

Practice Exercise

Export and Compare Formats

Extension: Quantization Experiment

Summary

Next Steps