Skip to main content

Overview

The Trash Classification AI System is built on a modular architecture with three core components that work together to detect, classify, and visualize waste materials in real-time video streams.

Architecture Components

The system follows a pipeline architecture where each module has a specific responsibility:

1. Segmentation Module

The segmentation module is responsible for detecting and tracking trash objects in video frames. Location: trash_classificator/segmentation/main.py
class SegmentationModel:
    def __init__(self):
        self.device = DeviceManager.get_device()
        self.trash_segmentation_model = ModelLoader(self.device).get_model()

    def inference(self, image: np.ndarray):
        results = self.trash_segmentation_model.track(
            image, 
            conf=0.55, 
            verbose=False, 
            persist=True, 
            imgsz=640,
            stream=True
        )
        return results, trash_classes, self.device
Key Features:
  • YOLO-based segmentation model
  • Hardware-accelerated device management (CUDA, MPS, or CPU)
  • Object tracking with persistence
  • Confidence-based filtering (threshold: 0.55)

2. Drawing Module

The drawing module visualizes detected trash objects with masks, bounding boxes, and tracking trails. Location: trash_classificator/drawing/main.py
class Drawing:
    def __init__(self):
        self.mask_drawer = MaskDrawer()        # Colored masks
        self.bbox_drawer = BoundingBoxDrawer()  # Bounding boxes
        self.track_drawer = TrackDrawer()       # Movement trails

    def draw(self, image, trash_track, trash_classes, device):
        masks = trash_track.masks.xy
        boxes = trash_track.boxes.xyxy.cpu()
        tracks_ids = trash_track.boxes.id.int().cpu().tolist()
        clss = trash_track.boxes.cls.cpu().tolist()

        image = self.mask_drawer.draw(image, masks, clss)
        image = self.bbox_drawer.draw(image, boxes, trash_classes, clss)
        image = self.track_drawer.draw(image, tracks_ids, boxes)
        return image
Key Features:
  • Three-layer visualization: masks, bounding boxes, and tracking trails
  • Class-specific color coding
  • 50-point tracking history for movement visualization
  • Semi-transparent overlay (50% alpha blending)

3. Processing Module

The main processor orchestrates the entire pipeline, coordinating between segmentation and drawing. Location: trash_classificator/processor.py
class TrashClassificator:
    def __init__(self):
        self.segmentation = SegmentationModel()
        self.draw_detections = Drawing()

    def frame_processing(self, image: np.ndarray):
        # Step 1: Trash segmentation
        trash_image = image.copy()
        trash_track, trash_classes, device = self.segmentation.inference(trash_image)

        for trash in trash_track:
            if trash.boxes.id is None:
                return image, 'No trash detected'

            # Step 2: Draw detections
            image_draw = image.copy()
            image_draw = self.draw_detections.draw(
                image_draw, trash, trash_classes, device
            )

            return image_draw, 'Trash detected'

        return image, 'Trash detected'

Data Flow

The system processes each video frame through the following pipeline:
1

Frame Input

Raw video frame (NumPy array) is passed to the processor
2

Segmentation

YOLO model detects and segments trash objects with tracking IDs
  • Returns: Results object with masks, boxes, classes, and tracking IDs
  • Confidence threshold: 0.55
  • Image size: 640x640
3

Visualization

Drawing module creates multi-layer visualization:
  1. Mask layer: Semi-transparent colored regions
  2. Bounding box layer: Labeled boxes with class names
  3. Tracking layer: Movement trails showing object paths
4

Frame Output

Annotated frame with detection status message
  • “Trash detected” or “No trash detected”

Module Interactions

Device Management

The system automatically selects the best available hardware:
class DeviceManager:
    @staticmethod
    def get_device() -> torch.device:
        if torch.backends.mps.is_available():
            device = torch.device("mps")      # Apple Silicon
        elif torch.cuda.is_available():
            device = torch.device("cuda")     # NVIDIA GPU
        else:
            device = torch.device("cpu")      # CPU fallback
        return device
The system prioritizes GPU acceleration (MPS for Apple Silicon, CUDA for NVIDIA) and falls back to CPU if no GPU is available.

Model Loading

The YOLO model is loaded once during initialization and reused across frames:
class ModelLoader:
    def __init__(self, device: torch.device):
        self.model = YOLO(trash_model_path).to(device)

    def get_model(self) -> YOLO:
        return self.model

Performance Considerations

Image Copying

The system creates copies of frames before processing to preserve original data

Streaming Results

YOLO inference uses stream=True for memory-efficient batch processing

GPU Acceleration

Automatic device selection ensures optimal performance on available hardware

Persistent Tracking

Object IDs persist across frames for consistent tracking visualization

Design Patterns

The codebase follows several software engineering best practices:
Each module defines an abstract interface (ABC) that concrete implementations must follow:
  • SegmentationModelInterface
  • DrawingInterface
  • MaskDrawerInterface
  • BoundingBoxDrawerInterface
  • TrackDrawerInterface
The TrashClassificator receives initialized components, making it easy to swap implementations or mock components for testing.
Each class has one clear purpose:
  • DeviceManager: Hardware selection
  • ModelLoader: Model initialization
  • SegmentationModel: Inference
  • MaskDrawer, BoundingBoxDrawer, TrackDrawer: Specific visualization tasks
This modular architecture makes it easy to extend the system with new visualization methods, different segmentation models, or additional processing steps.

Build docs developers (and LLMs) love