System Architecture

Overview

The Trash Classification AI System is built on a modular architecture with three core components that work together to detect, classify, and visualize waste materials in real-time video streams.

Architecture Components

The system follows a pipeline architecture where each module has a specific responsibility:

1. Segmentation Module

The segmentation module is responsible for detecting and tracking trash objects in video frames. Location: trash_classificator/segmentation/main.py

class SegmentationModel:
    def __init__(self):
        self.device = DeviceManager.get_device()
        self.trash_segmentation_model = ModelLoader(self.device).get_model()

    def inference(self, image: np.ndarray):
        results = self.trash_segmentation_model.track(
            image, 
            conf=0.55, 
            verbose=False, 
            persist=True, 
            imgsz=640,
            stream=True
        )
        return results, trash_classes, self.device

Key Features:

YOLO-based segmentation model
Hardware-accelerated device management (CUDA, MPS, or CPU)
Object tracking with persistence
Confidence-based filtering (threshold: 0.55)

2. Drawing Module

The drawing module visualizes detected trash objects with masks, bounding boxes, and tracking trails. Location: trash_classificator/drawing/main.py

class Drawing:
    def __init__(self):
        self.mask_drawer = MaskDrawer()        # Colored masks
        self.bbox_drawer = BoundingBoxDrawer()  # Bounding boxes
        self.track_drawer = TrackDrawer()       # Movement trails

    def draw(self, image, trash_track, trash_classes, device):
        masks = trash_track.masks.xy
        boxes = trash_track.boxes.xyxy.cpu()
        tracks_ids = trash_track.boxes.id.int().cpu().tolist()
        clss = trash_track.boxes.cls.cpu().tolist()

        image = self.mask_drawer.draw(image, masks, clss)
        image = self.bbox_drawer.draw(image, boxes, trash_classes, clss)
        image = self.track_drawer.draw(image, tracks_ids, boxes)
        return image

Key Features:

Three-layer visualization: masks, bounding boxes, and tracking trails
Class-specific color coding
50-point tracking history for movement visualization
Semi-transparent overlay (50% alpha blending)

3. Processing Module

The main processor orchestrates the entire pipeline, coordinating between segmentation and drawing. Location: trash_classificator/processor.py

class TrashClassificator:
    def __init__(self):
        self.segmentation = SegmentationModel()
        self.draw_detections = Drawing()

    def frame_processing(self, image: np.ndarray):
        # Step 1: Trash segmentation
        trash_image = image.copy()
        trash_track, trash_classes, device = self.segmentation.inference(trash_image)

        for trash in trash_track:
            if trash.boxes.id is None:
                return image, 'No trash detected'

            # Step 2: Draw detections
            image_draw = image.copy()
            image_draw = self.draw_detections.draw(
                image_draw, trash, trash_classes, device
            )

            return image_draw, 'Trash detected'

        return image, 'Trash detected'

Data Flow

The system processes each video frame through the following pipeline:

Frame Input

Raw video frame (NumPy array) is passed to the processor

Segmentation

YOLO model detects and segments trash objects with tracking IDs

Returns: Results object with masks, boxes, classes, and tracking IDs
Confidence threshold: 0.55
Image size: 640x640

Visualization

Drawing module creates multi-layer visualization:

Mask layer: Semi-transparent colored regions
Bounding box layer: Labeled boxes with class names
Tracking layer: Movement trails showing object paths

Frame Output

Annotated frame with detection status message

“Trash detected” or “No trash detected”

Module Interactions

Device Management

The system automatically selects the best available hardware:

class DeviceManager:
    @staticmethod
    def get_device() -> torch.device:
        if torch.backends.mps.is_available():
            device = torch.device("mps")      # Apple Silicon
        elif torch.cuda.is_available():
            device = torch.device("cuda")     # NVIDIA GPU
        else:
            device = torch.device("cpu")      # CPU fallback
        return device

The system prioritizes GPU acceleration (MPS for Apple Silicon, CUDA for NVIDIA) and falls back to CPU if no GPU is available.

Model Loading

The YOLO model is loaded once during initialization and reused across frames:

class ModelLoader:
    def __init__(self, device: torch.device):
        self.model = YOLO(trash_model_path).to(device)

    def get_model(self) -> YOLO:
        return self.model

Performance Considerations

Image Copying

The system creates copies of frames before processing to preserve original data

Streaming Results

YOLO inference uses stream=True for memory-efficient batch processing

GPU Acceleration

Automatic device selection ensures optimal performance on available hardware

Persistent Tracking

Object IDs persist across frames for consistent tracking visualization

Design Patterns

The codebase follows several software engineering best practices:

Interface Segregation

Each module defines an abstract interface (ABC) that concrete implementations must follow:

SegmentationModelInterface
DrawingInterface
MaskDrawerInterface
BoundingBoxDrawerInterface
TrackDrawerInterface

Dependency Injection

The TrashClassificator receives initialized components, making it easy to swap implementations or mock components for testing.

Single Responsibility

Each class has one clear purpose:

DeviceManager: Hardware selection
ModelLoader: Model initialization
SegmentationModel: Inference
MaskDrawer, BoundingBoxDrawer, TrackDrawer: Specific visualization tasks

This modular architecture makes it easy to extend the system with new visualization methods, different segmentation models, or additional processing steps.

Getting Started

Core Concepts

Training

Inference

Robotics Integration

Overview

Architecture Components

1. Segmentation Module

2. Drawing Module

3. Processing Module

Data Flow

Module Interactions

Device Management

Model Loading

Performance Considerations

Image Copying

Streaming Results

GPU Acceleration

Persistent Tracking

Design Patterns

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Training

Inference

Robotics Integration

​Overview

​Architecture Components

​1. Segmentation Module

​2. Drawing Module

​3. Processing Module

​Data Flow

​Module Interactions

​Device Management

​Model Loading

​Performance Considerations

Image Copying

Streaming Results

GPU Acceleration

Persistent Tracking

​Design Patterns

Build docs developers (and LLMs) love

Overview

Architecture Components

1. Segmentation Module

2. Drawing Module

3. Processing Module

Data Flow

Module Interactions

Device Management

Model Loading

Performance Considerations

Design Patterns