Prerequisites
Install Dependencies
Install the required Python packages:Requirements:
- Python 3.10+
- ultralytics >= 8.4.0
- matplotlib >= 3.7.0
- PyTorch (pre-installed on Lightning.ai A100)
Prepare the Dataset
Ensure your dataset follows the YOLO directory structure described in the Dataset Setup guide. The training script expects
data.yaml to be present in the project root.Training Script Overview
Thetrain.py script fine-tunes YOLO26s on the drone dataset with hyperparameters optimized for A100 GPUs.
Basic Usage
GPU Optimizations
The script includes several A100-specific optimizations for maximum throughput:What do these flags do?
What do these flags do?
allow_tf32: Enables TensorFloat-32 on Ampere GPUs (A100) for faster matrix operations with minimal accuracy losscudnn.benchmark: Auto-tunes cuDNN convolution algorithms for your specific input size (960×540). Since image size is fixed, this provides consistent speedupcompile=True: Usestorch.compile()for 10-30% faster training on A100 + PyTorch 2.x
Training Configuration
Core Hyperparameters
The training configuration is optimized for 11,387 annotated drone images:Parameter Reference
| Parameter | Value | Description |
|---|---|---|
data | "data.yaml" | Dataset configuration file |
epochs | 100 | Number of training epochs |
imgsz | 960 | Input image size (height). Width scales to preserve 16:9 aspect ratio |
batch | 0.90 | Use 90% of available GPU memory for batch size (auto-calculated) |
patience | 20 | Early stopping patience - stops if no improvement for 20 epochs |
cache | "ram" | Cache dataset in RAM for faster training (requires ~4GB system memory) |
workers | 8 | Number of dataloader workers (set to 0 on Windows due to multiprocessing issues) |
cos_lr | True | Use cosine learning rate schedule |
deterministic | False | Allow non-deterministic operations for speed |
compile | True | Enable torch.compile() for A100 acceleration |
rect | True | Rectangular training - preserves 16:9 aspect ratio, avoids 44% padding waste |
save_period | 10 | Save checkpoint every 10 epochs |
Memory Requirements:
cache="ram" requires about 4GB of system memory for the 650MB dataset. If your machine has less than 16GB RAM, change to cache="disk" in train.py:22.Data Augmentation
The training applies augmentations to improve generalization:| Augmentation | Value | Effect |
|---|---|---|
degrees | 15.0 | Random rotation ±15 degrees |
flipud | 0.5 | Vertical flip (50% probability) |
scale | 0.9 | Random scale 0.9-1.1× |
translate | 0.2 | Random translation ±20% of image size |
Training Output
Results are saved toruns/drone_detect/ with the following structure:
Evaluation
After training completes, the script automatically runs validation and test evaluation:Metrics Explained
mAP@50 vs mAP@50-95
mAP@50 vs mAP@50-95
- mAP@50: Mean Average Precision at IoU threshold 0.5. A detection counts as correct if the bounding box overlaps the ground truth by at least 50%
- mAP@50-95: Average of mAP across IoU thresholds from 0.5 to 0.95 in steps of 0.05. This is stricter and penalizes loose bounding boxes
Advanced: Multi-Model Comparison Study
Thestudy.py script trains all five YOLO26 variants (nano, small, medium, large, xlarge) and compares their performance:
- Trains all 5 models for 100 epochs each
- Measures training time, accuracy, inference speed, and peak GPU memory
- Generates comparison charts
- Supports crash recovery (skips already-completed models on restart)
runs/study/results_summary.json- JSON with all metricsruns/study/comparison_charts.png- Bar charts comparing modelsruns/study/efficiency_plots.png- Scatter plots (accuracy vs size/speed/memory)
Troubleshooting
Windows multiprocessing error
Windows multiprocessing error
Problem:
RuntimeError when workers > 0 on WindowsSolution: Set workers=0 in train.py:23. The cache="ram" setting compensates for single-threaded data loading. See the Known Issues page for more details.Out of memory during training
Out of memory during training
Problem: CUDA out of memory errorSolutions:
- Reduce
batchfrom 0.90 to 0.70 or lower - Reduce
imgszfrom 960 to 640 - Disable
cache="ram"(slower but uses no system memory) - Use a smaller model variant (e.g., yolo26n.pt instead of yolo26s.pt)
System RAM exhausted
System RAM exhausted
Problem: System freezes or swapping during trainingSolution: Change
cache="ram" to cache="disk" in train.py:22. This uses disk I/O instead of caching the 650MB dataset in memory.No 'drone' class in pretrained model
No 'drone' class in pretrained model
Expected Behavior: YOLO26s is pretrained on COCO (80 classes: person, car, bird, etc.) without a drone class. This is why fine-tuning is required. See the Known Issues page for more details.
Next Steps
After training completes:- Run inference on test images using the Detection Guide
- Inspect checkpoints in
runs/drone_detect/weights/ - Analyze results using the charts in
runs/drone_detect/