YOLO Models

The robotic arm system uses Ultralytics YOLO11s models for object detection. The system supports multiple model formats optimized for edge device deployment.

Ultralytics YOLO

The system uses the Ultralytics library for YOLO inference:

ultralytics==8.3.77

This is specified in requirements.txt:279.

ModelLoader Class

The ModelLoader class handles YOLO model initialization (arm_system/perception/vision/detection/model_loader.py:7):

class ModelLoader:
    def __init__(self):
        current_path = os.path.dirname(os.path.abspath(__file__))
        object_model_path: str = current_path + '/models/yolo11s_ncnn_model'
        self.model: YOLO = YOLO(object_model_path, task='detect')
        
    def get_model(self) -> YOLO:
        return self.model

Model Format

The system uses the NCNN format for optimized edge inference:

Model Path: arm_system/perception/vision/detection/models/yolo11s_ncnn_model
Format: NCNN (Tencent’s neural network inference framework)
Task: Object detection

NCNN Model Structure

The NCNN model consists of two files:

yolo11s_ncnn_model/
├── model.ncnn.param  # Model architecture (22 KB)
└── model.ncnn.bin    # Model weights (37 MB)

Supported Model Formats

While the production system uses NCNN, YOLO models support multiple formats:

Format	Description	Use Case
PyTorch (.pt)	Native training format	Training and development
ONNX (.onnx)	Cross-platform format	General deployment
NCNN	Tencent framework	Mobile and edge devices
MNN	Alibaba framework	Edge computing

Inference Parameters

The model runs inference with optimized parameters (arm_system/perception/vision/detection/main.py:20):

results = self.object_model.predict(
    image, 
    conf=0.55,      # Confidence threshold
    verbose=False,  # Suppress output
    imgsz=640,      # Input image size
    stream=True,    # Streaming mode
    task='detect',  # Detection task
    half=True       # FP16 precision
)

Parameter Details

conf=0.55

Minimum confidence threshold for detections. Only objects with confidence >= 0.55 are returned.

imgsz=640

Input image size for the model. Images are resized to 640x640 pixels before inference. This is the standard YOLO11s input size.

half=True

Enables FP16 (half-precision floating point) inference for:

Faster inference speed (approximately 2x)
Reduced memory usage
Minimal accuracy loss

Model Specifications

Based on the NCNN metadata (course/vision_class/export/models/ncnn/metadata.yaml:1):

Model: YOLO11s (small variant)
Training Dataset: COCO
Ultralytics Version: 8.3.77
Stride: 32
Batch Size: 1
Input Size: 640x640
Classes: 80 COCO classes

Detected Classes

The model is trained on 80 COCO classes including:

names = {
    0: 'person', 1: 'bicycle', 2: 'car', 3: 'motorcycle',
    # ... (76 more classes)
    47: 'apple',
    49: 'orange',
    39: 'bottle',
    # ...
}

The system specifically filters for: apple (47), orange (49), and bottle (39).

Real-Time Inference

The DetectionModel.inference() method integrates with the YOLO model:

from ultralytics import YOLO
from ultralytics.engine.results import Results

class DetectionModel(DetectionModelInterface):
    def __init__(self):
        self.object_model = ModelLoader().get_model()

    def inference(self, image: np.ndarray) -> tuple[list[Results], Dict[int, str]]:
        results = self.object_model.predict(
            image, 
            conf=0.55, 
            verbose=False, 
            imgsz=640, 
            stream=True, 
            task='detect', 
            half=True
        )
        return results, self.object_model.names

Results Object

The Ultralytics Results object contains:

boxes: Bounding box coordinates and confidences
boxes.xyxy: Box coordinates in (x1, y1, x2, y2) format
boxes.conf: Confidence scores
boxes.cls: Class IDs

Performance Optimization

NCNN Backend

NCNN provides:

Optimized inference for ARM processors
Low memory footprint
No GPU dependency
Fast CPU inference

Half Precision

FP16 inference (half=True) provides:

2x faster inference on compatible hardware
50% reduction in memory usage
Minimal accuracy impact (less than 1% mAP loss)

Streaming Mode

Streaming mode (stream=True) enables:

Memory-efficient batch processing
Lower latency for single images
Better resource management

Getting Started

Hardware Setup

Software Architecture

Computer Vision

Tutorials

Ultralytics YOLO

ModelLoader Class

Model Format

NCNN Model Structure

Supported Model Formats

Inference Parameters

Parameter Details

conf=0.55

imgsz=640

half=True

Model Specifications

Detected Classes

Real-Time Inference

Results Object

Performance Optimization

NCNN Backend

Half Precision

Streaming Mode

Build docs developers (and LLMs) love

Getting Started

Hardware Setup

Software Architecture

Computer Vision

Tutorials

​Ultralytics YOLO

​ModelLoader Class

​Model Format

​NCNN Model Structure

​Supported Model Formats

​Inference Parameters

​Parameter Details

​conf=0.55

​imgsz=640

​half=True

​Model Specifications

​Detected Classes

​Real-Time Inference

​Results Object

​Performance Optimization

​NCNN Backend

​Half Precision

​Streaming Mode

Build docs developers (and LLMs) love

Ultralytics YOLO

ModelLoader Class

Model Format

NCNN Model Structure

Supported Model Formats

Inference Parameters

Parameter Details

conf=0.55

imgsz=640

half=True

Model Specifications

Detected Classes

Real-Time Inference

Results Object

Performance Optimization

NCNN Backend

Half Precision

Streaming Mode