Skip to main content
BeamFinder fine-tunes YOLO26s on the DeepSense Scenario 23 drone dataset to detect drones in THz beam steering applications. This guide covers the complete training workflow from setup to evaluation.

Prerequisites

1

Install Dependencies

Install the required Python packages:
pip install -r requirements.txt
Requirements:
  • Python 3.10+
  • ultralytics >= 8.4.0
  • matplotlib >= 3.7.0
  • PyTorch (pre-installed on Lightning.ai A100)
2

Prepare the Dataset

Ensure your dataset follows the YOLO directory structure described in the Dataset Setup guide. The training script expects data.yaml to be present in the project root.
3

Download Pretrained Weights

The script automatically downloads YOLO26s pretrained weights (yolo26s.pt) from Ultralytics on first run. No manual download required.

Training Script Overview

The train.py script fine-tunes YOLO26s on the drone dataset with hyperparameters optimized for A100 GPUs.

Basic Usage

python train.py
Training runs for 100 epochs with automatic validation and test evaluation at the end.

GPU Optimizations

The script includes several A100-specific optimizations for maximum throughput:
import torch
from ultralytics import YOLO

# A100: maximize GPU throughput
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.allow_tf32 = True
torch.backends.cudnn.benchmark = True  # auto-tune convolutions for fixed imgsz=960

model = YOLO("yolo26s.pt")
  • allow_tf32: Enables TensorFloat-32 on Ampere GPUs (A100) for faster matrix operations with minimal accuracy loss
  • cudnn.benchmark: Auto-tunes cuDNN convolution algorithms for your specific input size (960×540). Since image size is fixed, this provides consistent speedup
  • compile=True: Uses torch.compile() for 10-30% faster training on A100 + PyTorch 2.x

Training Configuration

Core Hyperparameters

The training configuration is optimized for 11,387 annotated drone images:
model.train(
    data="data.yaml",
    epochs=100,
    imgsz=960,
    batch=0.90,
    patience=20,
    cache="ram",
    workers=8,
    cos_lr=True,
    deterministic=False,
    compile=True,
    project="runs",
    name="drone_detect",
    exist_ok=True,
    rect=True,
    save_period=10,
    # Data augmentation
    degrees=15.0,
    flipud=0.5,
    scale=0.9,
    translate=0.2,
)

Parameter Reference

ParameterValueDescription
data"data.yaml"Dataset configuration file
epochs100Number of training epochs
imgsz960Input image size (height). Width scales to preserve 16:9 aspect ratio
batch0.90Use 90% of available GPU memory for batch size (auto-calculated)
patience20Early stopping patience - stops if no improvement for 20 epochs
cache"ram"Cache dataset in RAM for faster training (requires ~4GB system memory)
workers8Number of dataloader workers (set to 0 on Windows due to multiprocessing issues)
cos_lrTrueUse cosine learning rate schedule
deterministicFalseAllow non-deterministic operations for speed
compileTrueEnable torch.compile() for A100 acceleration
rectTrueRectangular training - preserves 16:9 aspect ratio, avoids 44% padding waste
save_period10Save checkpoint every 10 epochs
Memory Requirements: cache="ram" requires about 4GB of system memory for the 650MB dataset. If your machine has less than 16GB RAM, change to cache="disk" in train.py:22.

Data Augmentation

The training applies augmentations to improve generalization:
AugmentationValueEffect
degrees15.0Random rotation ±15 degrees
flipud0.5Vertical flip (50% probability)
scale0.9Random scale 0.9-1.1×
translate0.2Random translation ±20% of image size

Training Output

Results are saved to runs/drone_detect/ with the following structure:
runs/drone_detect/
├── weights/
│   ├── best.pt          # Best checkpoint (highest mAP@50-95)
│   ├── last.pt          # Latest checkpoint
│   ├── epoch10.pt       # Checkpoint at epoch 10
│   ├── epoch20.pt       # Checkpoint at epoch 20
│   └── ...
├── results.csv          # Training metrics per epoch
├── results.png          # Loss and mAP curves
├── confusion_matrix.png # Validation confusion matrix
└── val_batch*_pred.jpg  # Validation predictions
The best.pt checkpoint is automatically selected based on validation mAP@50-95. Use this for inference.

Evaluation

After training completes, the script automatically runs validation and test evaluation:
# Validation set evaluation
metrics = model.val(imgsz=960, half=True)
print(f"Val  — mAP50: {metrics.box.map50:.4f}  mAP50-95: {metrics.box.map:.4f}")

# Test set evaluation
test_metrics = model.val(split="test", imgsz=960, half=True)
print(f"Test — mAP50: {test_metrics.box.map50:.4f}  mAP50-95: {test_metrics.box.map:.4f}")
Example output:
Val  — mAP50: 0.9234  mAP50-95: 0.6891
Test — mAP50: 0.9187  mAP50-95: 0.6823

Metrics Explained

  • mAP@50: Mean Average Precision at IoU threshold 0.5. A detection counts as correct if the bounding box overlaps the ground truth by at least 50%
  • mAP@50-95: Average of mAP across IoU thresholds from 0.5 to 0.95 in steps of 0.05. This is stricter and penalizes loose bounding boxes

Advanced: Multi-Model Comparison Study

The study.py script trains all five YOLO26 variants (nano, small, medium, large, xlarge) and compares their performance:
python study.py
This script:
  • Trains all 5 models for 100 epochs each
  • Measures training time, accuracy, inference speed, and peak GPU memory
  • Generates comparison charts
  • Supports crash recovery (skips already-completed models on restart)
MODELS = ["yolo26n.pt", "yolo26s.pt", "yolo26m.pt", "yolo26l.pt", "yolo26x.pt"]

TRAIN_ARGS = dict(
    data="data.yaml", epochs=100, imgsz=960, batch=0.90,
    patience=20, cache="ram", workers=8, cos_lr=True,
    deterministic=False, rect=True, save_period=10,
    compile=True,
    degrees=15.0, flipud=0.5, scale=0.9, translate=0.2,
)
Results are saved to:
  • runs/study/results_summary.json - JSON with all metrics
  • runs/study/comparison_charts.png - Bar charts comparing models
  • runs/study/efficiency_plots.png - Scatter plots (accuracy vs size/speed/memory)

Troubleshooting

Problem: RuntimeError when workers > 0 on WindowsSolution: Set workers=0 in train.py:23. The cache="ram" setting compensates for single-threaded data loading. See the Known Issues page for more details.
Problem: CUDA out of memory errorSolutions:
  • Reduce batch from 0.90 to 0.70 or lower
  • Reduce imgsz from 960 to 640
  • Disable cache="ram" (slower but uses no system memory)
  • Use a smaller model variant (e.g., yolo26n.pt instead of yolo26s.pt)
Problem: System freezes or swapping during trainingSolution: Change cache="ram" to cache="disk" in train.py:22. This uses disk I/O instead of caching the 650MB dataset in memory.
Expected Behavior: YOLO26s is pretrained on COCO (80 classes: person, car, bird, etc.) without a drone class. This is why fine-tuning is required. See the Known Issues page for more details.

Next Steps

After training completes:
  1. Run inference on test images using the Detection Guide
  2. Inspect checkpoints in runs/drone_detect/weights/
  3. Analyze results using the charts in runs/drone_detect/

References

Build docs developers (and LLMs) love