Skip to main content

Overview

The DetectionModel class provides an interface to YOLO object detection models for real-time inference on robotic arm camera images. Source: arm_system/perception/vision/detection/main.py:15

Interface Definition

DetectionModelInterface

class DetectionModelInterface(ABC):
    @abstractmethod
    def inference(self, image: np.ndarray) -> Tuple[Results, Dict[int, str]]:
        pass
Abstract base class defining the detection model interface. Source: arm_system/perception/vision/detection/main.py:9

Class Definition

DetectionModel

class DetectionModel(DetectionModelInterface):
    def __init__(self)
Initializes the detection model by loading the YOLO model via ModelLoader.

Attributes

object_model
YOLO
Loaded YOLO model instance from ModelLoader

Methods

inference

def inference(self, image: np.ndarray) -> tuple[list[Results], Dict[int, str]]
Runs object detection inference on the input image.
image
np.ndarray
required
Input image as NumPy array in BGR format (OpenCV format)
results
list[Results]
List of Ultralytics Results objects containing detection data
names
Dict[int, str]
Dictionary mapping class IDs to class names
Inference Configuration:
  • conf: 0.55 (confidence threshold)
  • verbose: False (suppress output)
  • imgsz: 640 (input image size)
  • stream: True (generator mode for memory efficiency)
  • task: ‘detect’ (object detection)
  • half: True (FP16 precision for faster inference)
Source: arm_system/perception/vision/detection/main.py:19

Results Object Structure

The Ultralytics Results object contains:
  • boxes: Bounding box data
    • boxes.xyxy - Box coordinates [x1, y1, x2, y2]
    • boxes.conf - Confidence scores
    • boxes.cls - Class IDs
  • names: Class name dictionary

Example Usage

import cv2
import numpy as np
from arm_system.perception.vision.detection.main import DetectionModel

# Initialize model
model = DetectionModel()

# Load image
image = cv2.imread('test_image.jpg')

# Run inference
results, class_names = model.inference(image)

# Process results
for result in results:
    boxes = result.boxes
    
    if boxes.shape[0] > 0:
        for i in range(boxes.shape[0]):
            # Get detection data
            confidence = boxes.conf.cpu().numpy()[i]
            class_id = int(boxes.cls[i])
            box = boxes.xyxy.cpu().numpy()[i]
            class_name = class_names[class_id]
            
            print(f"Detected: {class_name}")
            print(f"Confidence: {confidence:.2f}")
            print(f"Box: {box}")

Integration Example

Used by ImageProcessor for object detection:
# In ImageProcessor.__init__
self.detection: DetectionModelInterface = DetectionModel()

# In ImageProcessor.process_image
object_results, object_classes = self.detection.inference(copy_image)

# Process each result
for res in object_results:
    boxes = res.boxes
    if boxes.shape[0] == 0:
        continue
    
    confidence = boxes.conf.cpu().numpy()[0]
    class_id = int(boxes.cls[0])
    box_data = boxes.xyxy.cpu().numpy()[0]
    detected_class = object_classes[class_id]

Model Configuration

The model is loaded with the following specifications:
ParameterValueDescription
conf0.55Minimum confidence threshold
imgsz640Input image size (pixels)
halfTrueFP16 precision mode
streamTrueGenerator mode
taskdetectObject detection task

Performance Optimization

Half Precision (FP16): Enabled for faster inference on compatible hardware (GPUs, some CPUs). Streaming Mode: Results are returned as a generator to reduce memory usage when processing multiple images. Image Size: Fixed at 640x640 for optimal balance between speed and accuracy.

Supported Classes

The model supports all YOLO11s classes, with specific filtering for:
  • apple
  • orange
  • bottle

Architecture

DetectionModel
    |
    └── ModelLoader (loads YOLO11s NCNN model)
            |
            └── YOLO (Ultralytics)

Error Handling

Errors during inference are propagated to the caller (ImageProcessor), which handles them gracefully by returning None.

Build docs developers (and LLMs) love