Skip to main content
The robotic arm system uses Ultralytics YOLO11s models for object detection. The system supports multiple model formats optimized for edge device deployment.

Ultralytics YOLO

The system uses the Ultralytics library for YOLO inference:
ultralytics==8.3.77
This is specified in requirements.txt:279.

ModelLoader Class

The ModelLoader class handles YOLO model initialization (arm_system/perception/vision/detection/model_loader.py:7):
class ModelLoader:
    def __init__(self):
        current_path = os.path.dirname(os.path.abspath(__file__))
        object_model_path: str = current_path + '/models/yolo11s_ncnn_model'
        self.model: YOLO = YOLO(object_model_path, task='detect')
        
    def get_model(self) -> YOLO:
        return self.model

Model Format

The system uses the NCNN format for optimized edge inference:
  • Model Path: arm_system/perception/vision/detection/models/yolo11s_ncnn_model
  • Format: NCNN (Tencent’s neural network inference framework)
  • Task: Object detection

NCNN Model Structure

The NCNN model consists of two files:
yolo11s_ncnn_model/
├── model.ncnn.param  # Model architecture (22 KB)
└── model.ncnn.bin    # Model weights (37 MB)

Supported Model Formats

While the production system uses NCNN, YOLO models support multiple formats:
FormatDescriptionUse Case
PyTorch (.pt)Native training formatTraining and development
ONNX (.onnx)Cross-platform formatGeneral deployment
NCNNTencent frameworkMobile and edge devices
MNNAlibaba frameworkEdge computing

Inference Parameters

The model runs inference with optimized parameters (arm_system/perception/vision/detection/main.py:20):
results = self.object_model.predict(
    image, 
    conf=0.55,      # Confidence threshold
    verbose=False,  # Suppress output
    imgsz=640,      # Input image size
    stream=True,    # Streaming mode
    task='detect',  # Detection task
    half=True       # FP16 precision
)

Parameter Details

conf=0.55

Minimum confidence threshold for detections. Only objects with confidence >= 0.55 are returned.

imgsz=640

Input image size for the model. Images are resized to 640x640 pixels before inference. This is the standard YOLO11s input size.

half=True

Enables FP16 (half-precision floating point) inference for:
  • Faster inference speed (approximately 2x)
  • Reduced memory usage
  • Minimal accuracy loss

Model Specifications

Based on the NCNN metadata (course/vision_class/export/models/ncnn/metadata.yaml:1):
  • Model: YOLO11s (small variant)
  • Training Dataset: COCO
  • Ultralytics Version: 8.3.77
  • Stride: 32
  • Batch Size: 1
  • Input Size: 640x640
  • Classes: 80 COCO classes

Detected Classes

The model is trained on 80 COCO classes including:
names = {
    0: 'person', 1: 'bicycle', 2: 'car', 3: 'motorcycle',
    # ... (76 more classes)
    47: 'apple',
    49: 'orange',
    39: 'bottle',
    # ...
}
The system specifically filters for: apple (47), orange (49), and bottle (39).

Real-Time Inference

The DetectionModel.inference() method integrates with the YOLO model:
from ultralytics import YOLO
from ultralytics.engine.results import Results

class DetectionModel(DetectionModelInterface):
    def __init__(self):
        self.object_model = ModelLoader().get_model()

    def inference(self, image: np.ndarray) -> tuple[list[Results], Dict[int, str]]:
        results = self.object_model.predict(
            image, 
            conf=0.55, 
            verbose=False, 
            imgsz=640, 
            stream=True, 
            task='detect', 
            half=True
        )
        return results, self.object_model.names

Results Object

The Ultralytics Results object contains:
  • boxes: Bounding box coordinates and confidences
  • boxes.xyxy: Box coordinates in (x1, y1, x2, y2) format
  • boxes.conf: Confidence scores
  • boxes.cls: Class IDs

Performance Optimization

NCNN Backend

NCNN provides:
  • Optimized inference for ARM processors
  • Low memory footprint
  • No GPU dependency
  • Fast CPU inference

Half Precision

FP16 inference (half=True) provides:
  • 2x faster inference on compatible hardware
  • 50% reduction in memory usage
  • Minimal accuracy impact (less than 1% mAP loss)

Streaming Mode

Streaming mode (stream=True) enables:
  • Memory-efficient batch processing
  • Lower latency for single images
  • Better resource management

Build docs developers (and LLMs) love