Object Detection

The robotic arm system uses YOLO (You Only Look Once) models for real-time object detection. The detection pipeline processes captured images and identifies objects with confidence scores.

DetectionModel Class

The main detection interface is defined in arm_system/perception/vision/detection/main.py:15:

class DetectionModel(DetectionModelInterface):
    def __init__(self):
        self.object_model = ModelLoader().get_model()

    def inference(self, image: np.ndarray) -> tuple[list[Results], Dict[int, str]]:
        results = self.object_model.predict(
            image, 
            conf=0.55, 
            verbose=False, 
            imgsz=640, 
            stream=True, 
            task='detect', 
            half=True
        )
        return results, self.object_model.names

Inference Method

The inference() method performs object detection with the following parameters (arm_system/perception/vision/detection/main.py:20):

conf: 0.55 - Minimum confidence threshold for detections
verbose: False - Suppress detailed output
imgsz: 640 - Input image size for the model
stream: True - Enable streaming mode for efficiency
task: ‘detect’ - Object detection task
half: True - Use FP16 half-precision for faster inference

Return Values

The method returns a tuple containing:

results: List of Results objects from Ultralytics
names: Dictionary mapping class IDs to class names

Confidence Threshold Settings

The system uses two confidence thresholds:

Model Inference Threshold

Set during inference at 0.55 (arm_system/perception/vision/detection/main.py:20):

results = self.object_model.predict(image, conf=0.55, ...)

Processing Threshold

Configurable in ImageProcessor (default 0.45) for filtering detections (arm_system/perception/vision/image_processing.py:9):

class ImageProcessor:
    def __init__(self, confidence_threshold: float = 0.45):
        self.detection: DetectionModelInterface = DetectionModel()
        self.conf_threshold = confidence_threshold

The serial manager uses 0.45 as the threshold (arm_system/communication/serial_manager.py:50):

self.object_detect_model = ImageProcessor(confidence_threshold=0.45)

Integration with Main System

The detection workflow is triggered during object scanning in serial_manager.py:174:

def _handle_object_detection(self, data: dict):
    """object detect in real time"""
    try:
        # 1. capture image
        img_path = self.camera.capture_image()
        if not img_path:
            log.error("camera could not be captured")
            return
        
        # 2. YOLO detection
        image, yolo_result = self.object_detect_model.read_image_path(
            img_path, 
            draw_results=True, 
            save_drawn_img=True
        )
        if yolo_result is None:
            log.info("no detections.")
            return
        
        # 3. update data
        data.update({
            'class': yolo_result['class'],
            'confidence': yolo_result['confidence'],
            'timestamp': time.time(),
            'image_path': img_path
        })
        
        # 4. notify the central system
        if self.callbacks.get('scan_service'):
            self.callbacks['scan_service'](data)
            
    except Exception as e:
        log.error(f"error in object detection: {str(e)}")

Detection Workflow

The complete detection workflow consists of:

Image Capture: Camera captures image when object is detected by sensors
YOLO Inference: Model runs inference with conf=0.55
Result Processing: Best detection above threshold is selected
Filtering: Only specific classes are recognized (apple, orange, bottle)
Callback: Detection data is sent to the central system

Processing Details

The ImageProcessor class handles detection processing (arm_system/perception/vision/image_processing.py:24):

def process_image(self, image: np.ndarray, confidence_threshold: float = 0.45):
    # 1. inference
    copy_image = image.copy()
    object_results, object_classes = self.detection.inference(copy_image)
    
    # 2. init variables
    best_detection = {'class': '', 'confidence': 0.0, 'box': [], 'class_id': -1}
    
    # 3. process results
    for res in object_results:
        boxes = res.boxes
        
        if boxes.shape[0] == 0:
            continue
            
        confidence = boxes.conf.cpu().numpy()[0]
        class_id = int(boxes.cls[0])
        box_data = boxes.xyxy.cpu().numpy()[0]
            
        if confidence < confidence_threshold:
            continue
            
        detected_class = object_classes[class_id]
        clss_object = 'default'
            
        # Filter for specific objects
        if detected_class in ['apple', 'orange', 'bottle']:
            clss_object = detected_class
            
        if confidence > best_detection['confidence']:
            best_detection.update({
                'class': str(clss_object),
                'confidence': float(confidence),
                'box': box_data,
                'class_id': class_id
            })
    
    return image, best_detection

Class Filtering

Only three object classes are recognized (arm_system/perception/vision/image_processing.py:50):

apple
orange
bottle

All other detections are labeled as ‘default’.

Visualization

Detections are automatically drawn on images with bounding boxes and labels (arm_system/perception/vision/image_processing.py:74):

def _draw_detection(self, image: np.ndarray, detection: dict):
    box = detection['box']
    class_name = detection['class']
    confidence = detection['confidence']

    x1, y1, x2, y2 = map(int, box)
    color = (0, 255, 0)  # Green in BGR
    cv2.rectangle(image, (x1, y1), (x2, y2), color, 2)

    label = f"{class_name} {confidence:.2f}"
    cv2.putText(image, label, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.7, color, 2)

Annotated images are saved with the _detected.jpg suffix (arm_system/perception/vision/image_processing.py:93).

Getting Started

Hardware Setup

Software Architecture

Computer Vision

Tutorials

DetectionModel Class

Inference Method

Return Values

Confidence Threshold Settings

Model Inference Threshold

Processing Threshold

Integration with Main System

Detection Workflow

Processing Details

Class Filtering

Visualization

Build docs developers (and LLMs) love

Getting Started

Hardware Setup

Software Architecture

Computer Vision

Tutorials

​DetectionModel Class

​Inference Method

​Return Values

​Confidence Threshold Settings

​Model Inference Threshold

​Processing Threshold

​Integration with Main System

​Detection Workflow

​Processing Details

​Class Filtering

​Visualization

Build docs developers (and LLMs) love

DetectionModel Class

Inference Method

Return Values

Confidence Threshold Settings

Model Inference Threshold

Processing Threshold

Integration with Main System

Detection Workflow

Processing Details

Class Filtering

Visualization