YOLO Segmentation Model - Trash Classification AI System

Overview

The Trash Classification AI System uses YOLO (You Only Look Once) for real-time object detection and instance segmentation. The model is specifically trained to detect and segment three categories of waste materials.

Model Architecture

YOLO for Instance Segmentation

YOLO is a state-of-the-art deep learning model that performs object detection and segmentation in a single forward pass:

Single-Stage Detector

Unlike two-stage detectors, YOLO processes the entire image in one pass, enabling real-time performance.

Instance Segmentation

Beyond bounding boxes, the model generates pixel-precise masks for each detected object.

Multi-Class Detection

Simultaneously detects and classifies multiple objects across three waste categories.

Object Tracking

Built-in tracking maintains consistent object IDs across video frames.

Model Loading

The model is loaded using the Ultralytics YOLO library:

# From trash_classificator/segmentation/model_loader.py
from ultralytics import YOLO

class ModelLoader:
    def __init__(self, device: torch.device):
        self.model = YOLO(trash_model_path).to(device)

    def get_model(self) -> YOLO:
        return self.model

The model file trash_segmentation_model_v2.pt is a trained PyTorch model located in the trash_classificator/segmentation/models/ directory.

Input Specifications

Image Format

def inference(self, image: np.ndarray) -> tuple[list[Results], Dict[int, str], device]:
    results = self.trash_segmentation_model.track(
        image,              # Input: NumPy array (H, W, C)
        conf=0.55,
        verbose=False,
        persist=True,
        imgsz=640,          # Resize to 640x640
        stream=True
    )

Parameter	Value	Description
Input Type	`np.ndarray`	NumPy array representing the image
Shape	`(H, W, 3)`	Height × Width × Channels (RGB)
Color Format	RGB/BGR	Compatible with OpenCV and standard formats
Processing Size	640×640	Image is resized to 640×640 for inference

The model automatically handles image resizing and preprocessing. Input images of any size are resized to 640×640 while maintaining aspect ratio with padding.

Output Specifications

Results Object

The model returns a YOLO Results object containing:

# From trash_classificator/drawing/main.py
masks = trash_track.masks.xy        # Segmentation masks (polygons)
boxes = trash_track.boxes.xyxy      # Bounding boxes [x1, y1, x2, y2]
tracks_ids = trash_track.boxes.id   # Tracking IDs (persistent)
clss = trash_track.boxes.cls        # Class IDs [0, 1, 2]

Output Components

Masks (trash_track.masks.xy)

Format: List of NumPy arrays
Content: Polygon coordinates defining object boundaries
Shape: Variable - depends on object complexity
Usage: Used by MaskDrawer to create colored fill regions

# Example mask format
mask = np.array([[x1, y1], [x2, y2], ..., [xn, yn]])

Bounding Boxes (trash_track.boxes.xyxy)

Format: Tensor (N × 4)
Content: Box coordinates in [x1, y1, x2, y2] format
Coordinates: Top-left (x1, y1) to bottom-right (x2, y2)
Usage: Used by BoundingBoxDrawer and TrackDrawer

# Example: 2 detected objects
boxes = torch.tensor([
    [100, 150, 300, 400],  # Object 1
    [350, 200, 500, 450]   # Object 2
])

Tracking IDs (trash_track.boxes.id)

Format: Integer tensor
Content: Unique ID for each tracked object
Persistence: IDs remain consistent across frames
Usage: Used by TrackDrawer to maintain movement history

# Example: tracking 3 objects
track_ids = [1, 2, 5]  # Object IDs (not sequential)

Class IDs (trash_track.boxes.cls)

Format: Integer tensor
Content: Class ID for each detection [0, 1, 2]
Mapping: 0=cardboard/paper, 1=metal, 2=plastic
Usage: Used to determine color and label for each object

# Example: 3 objects of different types
classes = [0, 2, 1]  # Paper, Plastic, Metal

Confidence Thresholds

Detection Confidence

results = self.trash_segmentation_model.track(
    image, 
    conf=0.55,  # Minimum confidence threshold
    ...
)

Default Threshold: 0.55 (55%)Only detections with confidence scores ≥ 0.55 are returned. This threshold is tuned to balance detection sensitivity with false positive reduction.

Adjusting Confidence

You can modify the threshold based on your use case:

Use Case	Recommended Threshold	Trade-off
High Precision	0.70 - 0.80	Fewer false positives, may miss some objects
Balanced	0.50 - 0.60	Good balance (current setting: 0.55)
High Recall	0.30 - 0.45	Detect more objects, more false positives

Lowering the confidence threshold below 0.40 may result in many false detections, especially in cluttered scenes.

Tracking Parameters

Persistent Tracking

results = self.trash_segmentation_model.track(
    image,
    conf=0.55,
    persist=True,    # Enable persistent tracking
    stream=True,     # Stream results for efficiency
    ...
)

Tracking Features

Object ID Assignment

Each detected object receives a unique tracking ID on first detection

Cross-Frame Persistence

The same object maintains its ID across video frames

Movement Tracking

Object centroids are recorded to visualize movement trails (up to 50 points)

Re-identification

If an object temporarily disappears and reappears, the tracker attempts to maintain the same ID

Track History

The system maintains a movement history for each tracked object:

# From trash_classificator/drawing/main.py
class TrackDrawer:
    def __init__(self):
        self.track_history = defaultdict(list)
        self.thickness = 2

    def draw(self, image, tracks_ids, boxes):
        for track_id, box in zip(tracks_ids, boxes):
            track_line = self.track_history[track_id]
            centroid = (float((box[0] + box[2]) / 2), 
                       float((box[1] + box[3]) / 2))
            track_line.append(centroid)

            if len(track_line) > 50:  # Keep last 50 positions
                track_line.pop(0)

Track History Limit: 50 pointsEach object’s movement trail displays the last 50 centroid positions. This provides a good balance between showing recent movement and avoiding visual clutter.

Model Parameters Summary

Inference Configuration

Parameter	Value	Purpose
`conf`	0.55	Minimum detection confidence
`imgsz`	640	Input image size (640×640)
`persist`	True	Enable cross-frame tracking
`stream`	True	Stream results for memory efficiency
`verbose`	False	Disable logging output

Hardware Acceleration

# From trash_classificator/segmentation/device_manager.py
class DeviceManager:
    @staticmethod
    def get_device() -> torch.device:
        if torch.backends.mps.is_available():
            return torch.device("mps")      # Apple Silicon GPU
        elif torch.cuda.is_available():
            return torch.device("cuda")     # NVIDIA GPU
        else:
            return torch.device("cpu")      # CPU fallback

CUDA

NVIDIA GPU
Best performance for real-time processing

MPS

Apple Silicon
Optimized for M1/M2/M3 Macs

CPU

CPU Fallback
Works on any system, slower inference

Performance Characteristics

Inference Speed

Inference speed depends on hardware:

NVIDIA GPU (CUDA): ~30-60 FPS (real-time)
Apple Silicon (MPS): ~20-40 FPS
CPU: ~5-15 FPS (below real-time)

Memory Usage

Component	Typical Memory Usage
Model weights	~50-100 MB
Input frame (640×640)	~1.2 MB
Results per frame	~1-5 MB (depends on detections)
Track history	~100-500 KB (50 points × objects)

Streaming Mode

results = self.trash_segmentation_model.track(
    image,
    stream=True  # Process results as generator
)

# Results are returned as a generator, not a list
for trash_track in results:
    # Process each frame's results
    ...

Stream Mode Benefits:

Reduced memory footprint for batch processing
Results are generated on-demand
Better performance when processing video streams

Model Training

While this documentation focuses on inference, the model was trained using:

Training Script

training/model_train.py contains the training pipeline

Model Version

Current version: trash_segmentation_model_v2.pt

To retrain the model with additional data or different classes, refer to the training module documentation.

Error Handling

No Detections

# From trash_classificator/processor.py
for trash in trash_track:
    if trash.boxes.id is None:
        return image, 'No trash detected'

When no objects meet the confidence threshold, the model returns results with boxes.id = None.

Device Errors

If GPU acceleration fails, the system automatically falls back to CPU:

try:
    device = torch.device("cuda")
except:
    device = torch.device("cpu")  # Graceful degradation

Integration Reference

For complete integration examples, see:

Architecture Overview - Full system pipeline
Waste Categories - Class definitions and color coding
API Reference - Detailed method documentation

The YOLO model is seamlessly integrated into the pipeline and requires no manual configuration for standard use cases.

Getting Started

Core Concepts

Training

Inference

Robotics Integration

​Overview

​Model Architecture

​YOLO for Instance Segmentation

Single-Stage Detector

Instance Segmentation

Multi-Class Detection

Object Tracking

​Model Loading

​Input Specifications

​Image Format

​Output Specifications

​Results Object

​Output Components

​Confidence Thresholds

​Detection Confidence

​Adjusting Confidence

​Tracking Parameters

​Persistent Tracking

​Tracking Features

​Track History

​Model Parameters Summary

​Inference Configuration

​Hardware Acceleration

CUDA

MPS

CPU

​Performance Characteristics

​Inference Speed

​Memory Usage

​Streaming Mode

​Model Training

Training Script

Model Version

​Error Handling

​No Detections

​Device Errors

​Integration Reference

Build docs developers (and LLMs) love

Overview

Model Architecture

YOLO for Instance Segmentation

Model Loading

Input Specifications

Image Format

Output Specifications

Results Object

Output Components

Confidence Thresholds

Detection Confidence

Adjusting Confidence

Tracking Parameters

Persistent Tracking

Tracking Features

Track History

Model Parameters Summary

Inference Configuration

Hardware Acceleration

Performance Characteristics

Inference Speed

Memory Usage

Streaming Mode

Model Training

Error Handling

No Detections

Device Errors

Integration Reference