Skip to main content
YOLO models can be exported to various formats optimized for different deployment scenarios. The course materials include export scripts and examples for ONNX, NCNN, and MNN formats.

Export Script

The basic export script is located in course/vision_class/export/export_model.py:1:
from ultralytics import YOLO

model = YOLO('models/torch/yolo11s.pt')

model.export(format="mnn")
This script loads a PyTorch model and exports it to the specified format.

Model Export Process

The export process follows these steps:
  1. Load PyTorch Model: Load the trained .pt model file
  2. Configure Export: Specify target format and parameters
  3. Run Export: Ultralytics handles format conversion
  4. Verify Output: Test exported model with sample inference

Export Command

Using the Ultralytics API:
from ultralytics import YOLO

# Load trained model
model = YOLO('yolo11s.pt')

# Export to specific format
model.export(format="ncnn")  # or "onnx", "mnn", etc.

Supported Export Formats

The course materials include examples for three optimized formats:

ONNX Format

Location: course/vision_class/export/models/onnx/
model.export(format="onnx")
Output:
  • yolo11s.onnx (38 MB)
Characteristics:
  • Cross-platform compatibility
  • Hardware acceleration support
  • Standard inference engines (ONNX Runtime)
Use Cases:
  • Cloud deployment
  • Server inference
  • Cross-platform applications

NCNN Format

Location: course/vision_class/export/models/ncnn/
model.export(format="ncnn")
Output:
  • model.ncnn.param (23 KB) - Architecture definition
  • model.ncnn.bin (38 MB) - Model weights
  • metadata.yaml - Model metadata
Characteristics:
  • Optimized for ARM processors
  • No external dependencies
  • Efficient CPU inference
Use Cases:
  • Mobile devices (Android/iOS)
  • Embedded systems
  • Edge computing (Raspberry Pi, Jetson)

MNN Format

Location: course/vision_class/export/models/mnn/
model.export(format="mnn")
Output:
  • yolo11s.mnn (38 MB)
Characteristics:
  • Alibaba’s lightweight framework
  • Mobile-optimized inference
  • Low power consumption
Use Cases:
  • Mobile applications
  • IoT devices
  • Resource-constrained environments

NCNN Model Details

The production system uses NCNN format. The metadata file (course/vision_class/export/models/ncnn/metadata.yaml:1) contains:
description: Ultralytics YOLO11s model trained on coco.yaml
author: Ultralytics
date: '2025-04-06T11:00:06.430757'
version: 8.3.77
license: AGPL-3.0 License
stride: 32
task: detect
batch: 1
imgsz:
  - 640
  - 640
names:
  0: person
  1: bicycle
  # ... 78 more classes
  47: apple
  49: orange
  39: bottle
  # ...
args:
  batch: 1
  half: false

NCNN Inference Example

The course includes an NCNN inference test (course/vision_class/export/models/ncnn/model_ncnn.py:5):
import numpy as np
import ncnn
import torch

def test_inference():
    torch.manual_seed(0)
    in0 = torch.rand(1, 3, 640, 640, dtype=torch.float)
    out = []

    with ncnn.Net() as net:
        # Load NCNN model files
        net.load_param("models/torch/yolo11s_ncnn_model/model.ncnn.param")
        net.load_model("models/torch/yolo11s_ncnn_model/model.ncnn.bin")

        with net.create_extractor() as ex:
            # Input tensor
            ex.input("in0", ncnn.Mat(in0.squeeze(0).numpy()).clone())

            # Extract output
            _, out0 = ex.extract("out0")
            out.append(torch.from_numpy(np.array(out0)).unsqueeze(0))

    if len(out) == 1:
        return out[0]
    else:
        return tuple(out)

Optimization for Edge Devices

Exported models are optimized for edge deployment through:

Model Quantization

NCNN and MNN support quantization for:
  • Reduced model size (4x smaller with INT8)
  • Faster inference (2-4x speedup)
  • Lower power consumption

Hardware Acceleration

  • NCNN: Vulkan GPU acceleration, ARM NEON optimization
  • MNN: Metal (iOS), OpenCL, Vulkan support
  • ONNX: CUDA, TensorRT, DirectML acceleration

Memory Optimization

Edge-optimized formats provide:
  • In-place operations to reduce memory
  • Operator fusion for efficiency
  • Optimized memory allocation

Export Best Practices

1. Test Inference

Always verify exported model accuracy:
from ultralytics import YOLO

# Original model
original = YOLO('yolo11s.pt')
original_results = original('test_image.jpg')

# Export model
original.export(format='ncnn')

# Load exported model
exported = YOLO('yolo11s_ncnn_model')
exported_results = exported('test_image.jpg')

# Compare results

2. Choose Appropriate Format

DeploymentRecommended Format
Raspberry PiNCNN
Android/iOSNCNN or MNN
Cloud/ServerONNX
Jetson NanoONNX (TensorRT)
Web BrowserONNX (ONNX.js)

3. Consider Trade-offs

  • ONNX: Best compatibility, larger size
  • NCNN: Best for ARM CPUs, requires compilation
  • MNN: Good mobile performance, smaller ecosystem

Integration with System

The robotic arm system loads the NCNN model in model_loader.py:10:
object_model_path: str = current_path + '/models/yolo11s_ncnn_model'
self.model: YOLO = YOLO(object_model_path, task='detect')
Ultralytics automatically detects the NCNN format and uses the appropriate backend for inference.

Build docs developers (and LLMs) love