Skip to main content
Problems encountered while building BeamFinder, and how they were addressed. Based on real issues from the development process.
These issues are documented for transparency. Most have workarounds or have been resolved.

Active Issues

Status: Expected behavior, resolved by fine-tuning

Problem

YOLO26s comes pretrained on the COCO dataset, which includes 80 object classes (person, car, bird, bicycle, etc.) but does not include a “drone” class. Out-of-the-box inference with the pretrained model will not detect drones.

Why This Happens

The COCO dataset was created for general object detection and doesn’t include specialized objects like drones. The pretrained weights have learned features for common objects but need fine-tuning for domain-specific detection.

Solution

This is why BeamFinder fine-tunes YOLO26s on the DeepSense Scenario 23 drone dataset. The training script (train.py) starts with pretrained COCO weights (yolo26s.pt) and adapts them to detect drones:
model = YOLO("yolo26s.pt")  # Pretrained on COCO
model.train(
    data="data.yaml",  # DeepSense drone dataset
    epochs=100,
    # ... training args
)
After fine-tuning, the model’s output layer is reconfigured for single-class detection (nc: 1 in data.yaml:6).

Impact

  • Before training: Model will not detect drones
  • After training: Model achieves high mAP on drone detection
You must run python train.py before using detect.py.
Status: Worked around with workers=0

Problem

On Windows, setting workers > 0 in Ultralytics training causes a RuntimeError from Python’s multiprocessing module:
RuntimeError: An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

Why This Happens

Windows doesn’t support the fork() system call that Unix-based systems use for multiprocessing. Python’s multiprocessing on Windows uses spawn() instead, which requires the main module to be importable.

Workaround

Both train.py and detect.py include the required fixes:
  1. Use if __name__ == "__main__": (already present in both scripts)
  2. Set workers=0 to disable multiprocessing:
# train.py on Windows
model.train(
    workers=0,  # Required on Windows
    cache="ram",  # Compensates for single-threaded data loading
    # ... other args
)

Impact

Training is slower on Windows due to single-threaded data loading. However, cache="ram" eliminates disk I/O as a bottleneck, which partially compensates for this.Performance difference:
  • Linux with workers=8: ~100% GPU utilization
  • Windows with workers=0, cache="ram": ~90-95% GPU utilization
On Linux/Mac, you can use workers=8 for faster data loading (already set in train.py:23 for A100 training).
Status: Mitigated with rect=True

Problem

BeamFinder images are 960×540 pixels (16:9 aspect ratio), but YOLO defaults to square inputs (e.g., 640×640). Without rectangular training, about 44% of pixels would be black padding.

Why This Happens

Traditional YOLO implementations use square inputs for simplicity and efficiency. Images are letterboxed (padded with black bars) to fit the square:
960×540 → 960×960 (square)
↑ 420 pixels of black padding top/bottom
This wastes GPU compute on processing padding instead of actual image content.

Solution

BeamFinder enables rectangular training and inference with rect=True in both scripts:
# train.py:30
model.train(
    rect=True,  # Preserve aspect ratio
    # ... other args
)

# detect.py uses rect=True by default
With rect=True, YOLO preserves the aspect ratio:
960×540 → 960×540 (no padding needed)

Impact

  • Without rect=True: 44% of pixels are black padding, wasted compute
  • With rect=True: Full image utilization, ~20% faster training/inference
Rectangular training batches images with similar aspect ratios together. This is why you’ll see validation batches with different shapes during training.

Resolved Issues

Status: Resolved

Problem

Initially thought the DeepSense Scenario 23 dataset didn’t include bounding box labels, only images.

Resolution

The 11,387 YOLO-format annotation files (.txt files) were included in the original DeepSense download. They were paired with images and organized into the standard YOLO directory layout:
data/
├── images/
│   ├── train/         # 7,970 images
│   ├── validation/    # 1,708 images
│   └── test/          # 1,709 images
└── labels/
    ├── train/         # 7,970 .txt files
    ├── validation/    # 1,708 .txt files
    └── test/          # 1,709 .txt files
Dataset split: 70% train / 15% validation / 15% test.

Impact

No impact on users. The dataset is properly structured for YOLO training.

Noted Limitations

Status: Noted, not an issue in practice

Description

The DeepSense dataset images come from 51 different capture sessions (subfolders) with very uneven counts:
  • Some sessions: 1 image
  • Other sessions: 1,000+ images
This means some environmental conditions (lighting, background, drone position) are overrepresented in the dataset.

Mitigation

Images are shuffled before splitting into train/val/test sets, so each split has a reasonable mix of conditions:
# Dataset preparation (not in repo)
all_images = glob("raw_images/**/*.jpg")
random.shuffle(all_images)  # Shuffle before split
train, val, test = split_70_15_15(all_images)

Impact

Haven’t observed this cause problems in practice. Model generalizes well across the test set despite uneven session distribution.
If you notice the model overfitting to specific backgrounds or lighting conditions, you can add more augmentation in train.py:32-35.
Status: Under control with proper configuration

Dataset Size

  • On disk: ~650MB (11,387 images)
  • In RAM: ~4GB when cached (cache="ram")
  • VRAM: Depends on batch size and model variant

RAM Requirements

BeamFinder uses cache="ram" in train.py:22 to eliminate disk I/O as a bottleneck:
model.train(
    cache="ram",  # Needs ~4GB system RAM
    # ... other args
)
Minimum system RAM:
  • 16GB recommended (4GB for dataset cache + 8GB for OS/apps + 4GB buffer)
  • 8GB minimum (use cache="disk" instead)

VRAM Requirements

BeamFinder was developed on two different GPUs:
GPUVRAMConfigurationBatch Size
RTX 30504GBbatch=0.85, amp=True, imgsz=9602-4
A10040GBbatch=0.90, amp=True, imgsz=96032+
The batch=0.85 parameter tells Ultralytics to automatically pick the largest batch size that fits in GPU memory.

What If You Don’t Have Enough RAM?

Use disk caching instead:
model.train(
    cache="disk",  # Slower but less RAM
    # ... other args
)
Or disable caching:
model.train(
    cache=False,  # Slowest, minimal RAM
    # ... other args
)

What If You Don’t Have Enough VRAM?

  1. Reduce batch size:
    model.train(
        batch=0.50,  # Use 50% GPU memory instead of 90%
    )
    
  2. Use a smaller model variant:
    model = YOLO("yolo26n.pt")  # Nano instead of Small
    
  3. Reduce image size:
    model.train(
        imgsz=640,  # Smaller than 960
    )
    
Reducing imgsz below 960 may reduce detection accuracy, especially for small/distant drones.

Development Notes

These issues were documented during development and serve as a reference for understanding design decisions in the codebase.

CUDA Optimizations

The training script includes A100-specific optimizations (train.py:9-12):
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.allow_tf32 = True
torch.backends.cudnn.benchmark = True
These are safe to use on any CUDA GPU but provide the most benefit on Ampere architecture (RTX 30-series, A100, etc.).

Augmentation Strategy

BeamFinder uses conservative augmentation (train.py:32-35):
degrees=15.0,    # ±15° rotation
flipud=0.5,      # 50% vertical flip
scale=0.9,       # 90-110% zoom
translate=0.2,   # ±20% translation
This was chosen because:
  • Drones can appear at any orientation (rotation helps)
  • Images are captured from ground looking up (vertical flip simulates different perspectives)
  • Drones appear at varying distances (scale helps)

Torch Compile

The A100 training configuration uses compile=True (train.py:26):
model.train(
    compile=True,  # 10-30% faster on A100 + PyTorch 2.x
)
Requires:
  • PyTorch 2.0+
  • CUDA GPU
  • Linux (doesn’t work on Windows)
On unsupported platforms, Ultralytics automatically disables it with a warning.
See the Troubleshooting page for solutions to common problems.

Build docs developers (and LLMs) love