Overview
The Trash Classification AI System uses YOLO (You Only Look Once) for real-time object detection and instance segmentation. The model is specifically trained to detect and segment three categories of waste materials.Model Architecture
YOLO for Instance Segmentation
YOLO is a state-of-the-art deep learning model that performs object detection and segmentation in a single forward pass:Single-Stage Detector
Unlike two-stage detectors, YOLO processes the entire image in one pass, enabling real-time performance.
Instance Segmentation
Beyond bounding boxes, the model generates pixel-precise masks for each detected object.
Multi-Class Detection
Simultaneously detects and classifies multiple objects across three waste categories.
Object Tracking
Built-in tracking maintains consistent object IDs across video frames.
Model Loading
The model is loaded using the Ultralytics YOLO library:The model file
trash_segmentation_model_v2.pt is a trained PyTorch model located in the trash_classificator/segmentation/models/ directory.Input Specifications
Image Format
| Parameter | Value | Description |
|---|---|---|
| Input Type | np.ndarray | NumPy array representing the image |
| Shape | (H, W, 3) | Height × Width × Channels (RGB) |
| Color Format | RGB/BGR | Compatible with OpenCV and standard formats |
| Processing Size | 640×640 | Image is resized to 640×640 for inference |
The model automatically handles image resizing and preprocessing. Input images of any size are resized to 640×640 while maintaining aspect ratio with padding.
Output Specifications
Results Object
The model returns a YOLOResults object containing:
Output Components
Masks (trash_track.masks.xy)
Masks (trash_track.masks.xy)
Format: List of NumPy arrays
Content: Polygon coordinates defining object boundaries
Shape: Variable - depends on object complexity
Usage: Used by
Content: Polygon coordinates defining object boundaries
Shape: Variable - depends on object complexity
Usage: Used by
MaskDrawer to create colored fill regionsBounding Boxes (trash_track.boxes.xyxy)
Bounding Boxes (trash_track.boxes.xyxy)
Format: Tensor (N × 4)
Content: Box coordinates in [x1, y1, x2, y2] format
Coordinates: Top-left (x1, y1) to bottom-right (x2, y2)
Usage: Used by
Content: Box coordinates in [x1, y1, x2, y2] format
Coordinates: Top-left (x1, y1) to bottom-right (x2, y2)
Usage: Used by
BoundingBoxDrawer and TrackDrawerTracking IDs (trash_track.boxes.id)
Tracking IDs (trash_track.boxes.id)
Format: Integer tensor
Content: Unique ID for each tracked object
Persistence: IDs remain consistent across frames
Usage: Used by
Content: Unique ID for each tracked object
Persistence: IDs remain consistent across frames
Usage: Used by
TrackDrawer to maintain movement historyClass IDs (trash_track.boxes.cls)
Class IDs (trash_track.boxes.cls)
Format: Integer tensor
Content: Class ID for each detection [0, 1, 2]
Mapping: 0=cardboard/paper, 1=metal, 2=plastic
Usage: Used to determine color and label for each object
Content: Class ID for each detection [0, 1, 2]
Mapping: 0=cardboard/paper, 1=metal, 2=plastic
Usage: Used to determine color and label for each object
Confidence Thresholds
Detection Confidence
Default Threshold: 0.55 (55%)Only detections with confidence scores ≥ 0.55 are returned. This threshold is tuned to balance detection sensitivity with false positive reduction.
Adjusting Confidence
You can modify the threshold based on your use case:| Use Case | Recommended Threshold | Trade-off |
|---|---|---|
| High Precision | 0.70 - 0.80 | Fewer false positives, may miss some objects |
| Balanced | 0.50 - 0.60 | Good balance (current setting: 0.55) |
| High Recall | 0.30 - 0.45 | Detect more objects, more false positives |
Tracking Parameters
Persistent Tracking
Tracking Features
Track History
The system maintains a movement history for each tracked object:Model Parameters Summary
Inference Configuration
| Parameter | Value | Purpose |
|---|---|---|
conf | 0.55 | Minimum detection confidence |
imgsz | 640 | Input image size (640×640) |
persist | True | Enable cross-frame tracking |
stream | True | Stream results for memory efficiency |
verbose | False | Disable logging output |
Hardware Acceleration
CUDA
NVIDIA GPU
Best performance for real-time processing
Best performance for real-time processing
MPS
Apple Silicon
Optimized for M1/M2/M3 Macs
Optimized for M1/M2/M3 Macs
CPU
CPU Fallback
Works on any system, slower inference
Works on any system, slower inference
Performance Characteristics
Inference Speed
Inference speed depends on hardware:
- NVIDIA GPU (CUDA): ~30-60 FPS (real-time)
- Apple Silicon (MPS): ~20-40 FPS
- CPU: ~5-15 FPS (below real-time)
Memory Usage
| Component | Typical Memory Usage |
|---|---|
| Model weights | ~50-100 MB |
| Input frame (640×640) | ~1.2 MB |
| Results per frame | ~1-5 MB (depends on detections) |
| Track history | ~100-500 KB (50 points × objects) |
Streaming Mode
Stream Mode Benefits:
- Reduced memory footprint for batch processing
- Results are generated on-demand
- Better performance when processing video streams
Model Training
While this documentation focuses on inference, the model was trained using:Training Script
training/model_train.py contains the training pipelineModel Version
Current version:
trash_segmentation_model_v2.ptError Handling
No Detections
boxes.id = None.
Device Errors
If GPU acceleration fails, the system automatically falls back to CPU:Integration Reference
For complete integration examples, see:- Architecture Overview - Full system pipeline
- Waste Categories - Class definitions and color coding
- API Reference - Detailed method documentation
The YOLO model is seamlessly integrated into the pipeline and requires no manual configuration for standard use cases.