Skip to main content
The robotic arm system uses YOLO (You Only Look Once) models for real-time object detection. The detection pipeline processes captured images and identifies objects with confidence scores.

DetectionModel Class

The main detection interface is defined in arm_system/perception/vision/detection/main.py:15:
class DetectionModel(DetectionModelInterface):
    def __init__(self):
        self.object_model = ModelLoader().get_model()

    def inference(self, image: np.ndarray) -> tuple[list[Results], Dict[int, str]]:
        results = self.object_model.predict(
            image, 
            conf=0.55, 
            verbose=False, 
            imgsz=640, 
            stream=True, 
            task='detect', 
            half=True
        )
        return results, self.object_model.names

Inference Method

The inference() method performs object detection with the following parameters (arm_system/perception/vision/detection/main.py:20):
  • conf: 0.55 - Minimum confidence threshold for detections
  • verbose: False - Suppress detailed output
  • imgsz: 640 - Input image size for the model
  • stream: True - Enable streaming mode for efficiency
  • task: ‘detect’ - Object detection task
  • half: True - Use FP16 half-precision for faster inference

Return Values

The method returns a tuple containing:
  1. results: List of Results objects from Ultralytics
  2. names: Dictionary mapping class IDs to class names

Confidence Threshold Settings

The system uses two confidence thresholds:

Model Inference Threshold

Set during inference at 0.55 (arm_system/perception/vision/detection/main.py:20):
results = self.object_model.predict(image, conf=0.55, ...)

Processing Threshold

Configurable in ImageProcessor (default 0.45) for filtering detections (arm_system/perception/vision/image_processing.py:9):
class ImageProcessor:
    def __init__(self, confidence_threshold: float = 0.45):
        self.detection: DetectionModelInterface = DetectionModel()
        self.conf_threshold = confidence_threshold
The serial manager uses 0.45 as the threshold (arm_system/communication/serial_manager.py:50):
self.object_detect_model = ImageProcessor(confidence_threshold=0.45)

Integration with Main System

The detection workflow is triggered during object scanning in serial_manager.py:174:
def _handle_object_detection(self, data: dict):
    """object detect in real time"""
    try:
        # 1. capture image
        img_path = self.camera.capture_image()
        if not img_path:
            log.error("camera could not be captured")
            return
        
        # 2. YOLO detection
        image, yolo_result = self.object_detect_model.read_image_path(
            img_path, 
            draw_results=True, 
            save_drawn_img=True
        )
        if yolo_result is None:
            log.info("no detections.")
            return
        
        # 3. update data
        data.update({
            'class': yolo_result['class'],
            'confidence': yolo_result['confidence'],
            'timestamp': time.time(),
            'image_path': img_path
        })
        
        # 4. notify the central system
        if self.callbacks.get('scan_service'):
            self.callbacks['scan_service'](data)
            
    except Exception as e:
        log.error(f"error in object detection: {str(e)}")

Detection Workflow

The complete detection workflow consists of:
  1. Image Capture: Camera captures image when object is detected by sensors
  2. YOLO Inference: Model runs inference with conf=0.55
  3. Result Processing: Best detection above threshold is selected
  4. Filtering: Only specific classes are recognized (apple, orange, bottle)
  5. Callback: Detection data is sent to the central system

Processing Details

The ImageProcessor class handles detection processing (arm_system/perception/vision/image_processing.py:24):
def process_image(self, image: np.ndarray, confidence_threshold: float = 0.45):
    # 1. inference
    copy_image = image.copy()
    object_results, object_classes = self.detection.inference(copy_image)
    
    # 2. init variables
    best_detection = {'class': '', 'confidence': 0.0, 'box': [], 'class_id': -1}
    
    # 3. process results
    for res in object_results:
        boxes = res.boxes
        
        if boxes.shape[0] == 0:
            continue
            
        confidence = boxes.conf.cpu().numpy()[0]
        class_id = int(boxes.cls[0])
        box_data = boxes.xyxy.cpu().numpy()[0]
            
        if confidence < confidence_threshold:
            continue
            
        detected_class = object_classes[class_id]
        clss_object = 'default'
            
        # Filter for specific objects
        if detected_class in ['apple', 'orange', 'bottle']:
            clss_object = detected_class
            
        if confidence > best_detection['confidence']:
            best_detection.update({
                'class': str(clss_object),
                'confidence': float(confidence),
                'box': box_data,
                'class_id': class_id
            })
    
    return image, best_detection

Class Filtering

Only three object classes are recognized (arm_system/perception/vision/image_processing.py:50):
  • apple
  • orange
  • bottle
All other detections are labeled as ‘default’.

Visualization

Detections are automatically drawn on images with bounding boxes and labels (arm_system/perception/vision/image_processing.py:74):
def _draw_detection(self, image: np.ndarray, detection: dict):
    box = detection['box']
    class_name = detection['class']
    confidence = detection['confidence']

    x1, y1, x2, y2 = map(int, box)
    color = (0, 255, 0)  # Green in BGR
    cv2.rectangle(image, (x1, y1), (x2, y2), color, 2)

    label = f"{class_name} {confidence:.2f}"
    cv2.putText(image, label, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.7, color, 2)
Annotated images are saved with the _detected.jpg suffix (arm_system/perception/vision/image_processing.py:93).

Build docs developers (and LLMs) love