Active Issues
1. COCO Pretrained Model Has No Drone Class
1. COCO Pretrained Model Has No Drone Class
Status: Expected behavior, resolved by fine-tuningAfter fine-tuning, the model’s output layer is reconfigured for single-class detection (
Problem
YOLO26s comes pretrained on the COCO dataset, which includes 80 object classes (person, car, bird, bicycle, etc.) but does not include a “drone” class. Out-of-the-box inference with the pretrained model will not detect drones.Why This Happens
The COCO dataset was created for general object detection and doesn’t include specialized objects like drones. The pretrained weights have learned features for common objects but need fine-tuning for domain-specific detection.Solution
This is why BeamFinder fine-tunes YOLO26s on the DeepSense Scenario 23 drone dataset. The training script (train.py) starts with pretrained COCO weights (yolo26s.pt) and adapts them to detect drones:nc: 1 in data.yaml:6).Impact
- Before training: Model will not detect drones
- After training: Model achieves high mAP on drone detection
python train.py before using detect.py.4. Windows Multiprocessing Doesn't Work
4. Windows Multiprocessing Doesn't Work
Status: Worked around with
workers=0Problem
On Windows, settingworkers > 0 in Ultralytics training causes a RuntimeError from Python’s multiprocessing module:Why This Happens
Windows doesn’t support thefork() system call that Unix-based systems use for multiprocessing. Python’s multiprocessing on Windows uses spawn() instead, which requires the main module to be importable.Workaround
Bothtrain.py and detect.py include the required fixes:- Use
if __name__ == "__main__":(already present in both scripts) - Set
workers=0to disable multiprocessing:
Impact
Training is slower on Windows due to single-threaded data loading. However,cache="ram" eliminates disk I/O as a bottleneck, which partially compensates for this.Performance difference:- Linux with
workers=8: ~100% GPU utilization - Windows with
workers=0,cache="ram": ~90-95% GPU utilization
On Linux/Mac, you can use
workers=8 for faster data loading (already set in train.py:23 for A100 training).6. Aspect Ratio Mismatch
6. Aspect Ratio Mismatch
Status: Mitigated with This wastes GPU compute on processing padding instead of actual image content.With
rect=TrueProblem
BeamFinder images are 960×540 pixels (16:9 aspect ratio), but YOLO defaults to square inputs (e.g., 640×640). Without rectangular training, about 44% of pixels would be black padding.Why This Happens
Traditional YOLO implementations use square inputs for simplicity and efficiency. Images are letterboxed (padded with black bars) to fit the square:Solution
BeamFinder enables rectangular training and inference withrect=True in both scripts:rect=True, YOLO preserves the aspect ratio:Impact
- Without
rect=True: 44% of pixels are black padding, wasted compute - With
rect=True: Full image utilization, ~20% faster training/inference
Rectangular training batches images with similar aspect ratios together. This is why you’ll see validation batches with different shapes during training.
Resolved Issues
2. Finding the Bounding Box Annotations
2. Finding the Bounding Box Annotations
Status: ResolvedDataset split: 70% train / 15% validation / 15% test.
Problem
Initially thought the DeepSense Scenario 23 dataset didn’t include bounding box labels, only images.Resolution
The 11,387 YOLO-format annotation files (.txt files) were included in the original DeepSense download. They were paired with images and organized into the standard YOLO directory layout:Impact
No impact on users. The dataset is properly structured for YOLO training.Noted Limitations
3. Uneven Distribution Across Capture Sessions
3. Uneven Distribution Across Capture Sessions
Status: Noted, not an issue in practice
Description
The DeepSense dataset images come from 51 different capture sessions (subfolders) with very uneven counts:- Some sessions: 1 image
- Other sessions: 1,000+ images
Mitigation
Images are shuffled before splitting into train/val/test sets, so each split has a reasonable mix of conditions:Impact
Haven’t observed this cause problems in practice. Model generalizes well across the test set despite uneven session distribution.If you notice the model overfitting to specific backgrounds or lighting conditions, you can add more augmentation in
train.py:32-35.5. Memory Considerations
5. Memory Considerations
Status: Under control with proper configurationMinimum system RAM:
The Or disable caching:
Dataset Size
- On disk: ~650MB (11,387 images)
- In RAM: ~4GB when cached (
cache="ram") - VRAM: Depends on batch size and model variant
RAM Requirements
BeamFinder usescache="ram" in train.py:22 to eliminate disk I/O as a bottleneck:- 16GB recommended (4GB for dataset cache + 8GB for OS/apps + 4GB buffer)
- 8GB minimum (use
cache="disk"instead)
VRAM Requirements
BeamFinder was developed on two different GPUs:| GPU | VRAM | Configuration | Batch Size |
|---|---|---|---|
| RTX 3050 | 4GB | batch=0.85, amp=True, imgsz=960 | 2-4 |
| A100 | 40GB | batch=0.90, amp=True, imgsz=960 | 32+ |
batch=0.85 parameter tells Ultralytics to automatically pick the largest batch size that fits in GPU memory.What If You Don’t Have Enough RAM?
Use disk caching instead:What If You Don’t Have Enough VRAM?
-
Reduce batch size:
-
Use a smaller model variant:
-
Reduce image size:
Development Notes
These issues were documented during development and serve as a reference for understanding design decisions in the codebase.CUDA Optimizations
The training script includes A100-specific optimizations (train.py:9-12):
Augmentation Strategy
BeamFinder uses conservative augmentation (train.py:32-35):
- Drones can appear at any orientation (rotation helps)
- Images are captured from ground looking up (vertical flip simulates different perspectives)
- Drones appear at varying distances (scale helps)
Torch Compile
The A100 training configuration usescompile=True (train.py:26):
- PyTorch 2.0+
- CUDA GPU
- Linux (doesn’t work on Windows)
See the Troubleshooting page for solutions to common problems.