Dataset Structure
The training pipeline expects a specific directory structure as defined indata.yaml. The dataset is organized into three splits:
Directory Layout
Each split (train/val/test) must contain both
images/ and labels/ subdirectories. Image filenames must match their corresponding label files.Class Configuration
The model is trained to detect 3 classes of recyclable materials:| Class ID | Class Name | Description |
|---|---|---|
| 0 | cardboard paper | Cardboard boxes, paper products |
| 1 | metal | Aluminum cans, tin containers |
| 2 | plastic | Plastic bottles, containers |
Image Requirements
Supported Formats
- Image Types: JPG, JPEG, PNG, BMP
- Recommended Format: JPG (for storage efficiency)
- Color Space: RGB
Image Specifications
Resolution
Images will be resized to 640x640 during training. Higher resolution source images (1920x1080 or higher) are recommended for better quality.
Aspect Ratio
Any aspect ratio is supported. The training pipeline automatically handles letterboxing to maintain proportions.
Best Practices
- Diversity: Include varied backgrounds, lighting conditions, and angles
- Occlusion: Include partially occluded objects to improve robustness
- Scale Variation: Capture objects at different distances/sizes
- Multiple Objects: Include scenes with multiple trash items
Annotation Format
YOLO format uses normalized coordinates for segmentation masks. Each annotation file is a text file with the same name as its corresponding image.YOLO Segmentation Format
- class_id: Integer representing the class (0, 1, or 2)
- x, y: Normalized polygon coordinates (0.0 to 1.0)
- Each line represents one object instance
Example Annotation
training/data/train/labels/image1.txt
- One plastic object (class 2) with a 5-point polygon
- One cardboard paper object (class 0) with a 4-point polygon
Coordinates are normalized by dividing pixel values by image width (for x) and height (for y). All values must be between 0.0 and 1.0.
Data Augmentation
YOLOv11 includes built-in augmentation strategies that are automatically applied during training:Spatial Augmentations
- Mosaic: Combines 4 images into one for multi-scale learning
- Random Scaling: Varies object sizes (0.5x to 1.5x)
- Random Rotation: Up to ±10 degrees
- Random Flipping: Horizontal and vertical flips
- Random Translation: Shifts images within bounds
Pixel-level Augmentations
- HSV Augmentation: Varies hue, saturation, and value
- Brightness/Contrast: Random adjustments
- Noise Addition: Adds subtle Gaussian noise
Advanced Techniques
- MixUp: Blends two images and their labels
- CutOut: Randomly masks rectangular regions
- Perspective Transforms: Simulates different camera angles
Train/Val/Test Splits
Recommended Split Ratios
| Split | Percentage | Purpose |
|---|---|---|
| Train | 70-80% | Model learning and weight updates |
| Validation | 10-15% | Hyperparameter tuning and early stopping |
| Test | 10-15% | Final model evaluation |
Split Guidelines
Dataset Validation Checklist
Before starting training, verify:- All images have corresponding annotation files
- Annotation files use correct class IDs (0, 1, 2)
- Coordinates are normalized (0.0 to 1.0)
- Directory structure matches
data.yamlpaths - No corrupt or unreadable images
- Balanced class distribution across splits
- Minimum 300-500 images per class for good results
YOLOv11 will report dataset statistics at the start of training, including class distribution and any issues with annotations.
Next Steps
Once your dataset is prepared and validated:Start Training
Configure and run the training script with your prepared dataset