Skip to main content
This guide covers the full installation process including environment setup, dataset preparation, and model training.

System Requirements

Hardware

Recommended

  • GPU: NVIDIA A100 (40GB+ VRAM)
  • RAM: 16GB+ (for cache="ram")
  • Storage: 2GB for dataset + models
  • OS: Linux (Ubuntu 22.04)

Minimum

  • GPU: NVIDIA RTX 3050 (4GB VRAM)
  • RAM: 8GB (use cache="disk")
  • Storage: 2GB
  • OS: Windows/Linux/macOS

Software

  • Python: 3.10 or higher
  • CUDA: 11.8+ (for GPU acceleration)
  • PyTorch: 2.x (for compile=True support)

Installation

1

Clone or Download BeamFinder

Get the source code:
git clone https://github.com/yourusername/beamfinder.git
cd beamfinder
2

Install Python Dependencies

Install required packages:
pip install -r requirements.txt
The requirements.txt contains:
requirements.txt
ultralytics>=8.4.0
matplotlib>=3.7.0
On Lightning.ai A100 instances, CUDA and PyTorch come pre-installed. On other systems, install PyTorch first:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
3

Download Pretrained Weights

Download the YOLO26s pretrained model:
wget https://github.com/ultralytics/assets/releases/download/v8.4.0/yolo26s.pt
Place yolo26s.pt in the project root directory.
The COCO-pretrained model does not include a “drone” class. You must fine-tune on drone data before running detection.

Dataset Preparation

BeamFinder expects the DeepSense Scenario 23 dataset (or any dataset in YOLO format).
1

Download Dataset

Obtain the DeepSense Scenario 23 drone dataset, which includes:
  • 11,387 images (960×540 resolution)
  • 11,387 YOLO-format annotation files (.txt)
The dataset comes from 51 different capture sessions with varying conditions.
2

Organize Directory Structure

Create the following directory layout:
mkdir -p data/images/train
mkdir -p data/images/validation
mkdir -p data/images/test
mkdir -p data/labels/train
mkdir -p data/labels/validation
mkdir -p data/labels/test
Expected structure:
BeamFinder/
├── data/
│   ├── images/
│   │   ├── train/         # 7,970 images (70%)
│   │   ├── validation/    # 1,708 images (15%)
│   │   └── test/          # 1,709 images (15%)
│   └── labels/            # Matching .txt files
│       ├── train/
│       ├── validation/
│       └── test/
3

Split Dataset

Distribute images and labels into train/validation/test splits (70/15/15):
import random
from pathlib import Path
import shutil

images = list(Path("raw_data").glob("*.jpg"))
random.shuffle(images)

train_split = int(0.70 * len(images))
val_split = int(0.85 * len(images))

train_imgs = images[:train_split]
val_imgs = images[train_split:val_split]
test_imgs = images[val_split:]

for img in train_imgs:
    shutil.copy(img, "data/images/train/")
    shutil.copy(img.with_suffix(".txt"), "data/labels/train/")

# Repeat for val_imgs and test_imgs...
Shuffling before splitting ensures each set has a diverse mix of the 51 capture sessions.
4

Create data.yaml

Create a data.yaml configuration file in the project root:
data.yaml
path: data
train: images/train
val: images/validation
test: images/test

nc: 1
names: ["drone"]
This tells YOLO where to find images and defines the single class “drone”.

YOLO Annotation Format

Each .txt file contains one line per bounding box:
<class_id> <x_center> <y_center> <width> <height>
  • class_id: 0 (for “drone”)
  • x_center, y_center: Normalized center coordinates (0-1)
  • width, height: Normalized box dimensions (0-1)
Example (image_001.txt):
0 0.5104 0.4861 0.1354 0.1574
0 0.7234 0.3125 0.0987 0.1203
This represents 2 drones in image_001.jpg.

Training

1

Configure Training Script

The train.py script contains optimized hyperparameters for A100 GPUs:
train.py
from pathlib import Path
import torch
from ultralytics import YOLO

SCRIPT_DIR = Path(__file__).resolve().parent

if __name__ == "__main__":
    # A100 optimizations
    torch.backends.cuda.matmul.allow_tf32 = True
    torch.backends.cudnn.allow_tf32 = True
    torch.backends.cudnn.benchmark = True

    model = YOLO("yolo26s.pt")

    model.train(
        data="data.yaml",
        epochs=100,
        imgsz=960,
        batch=0.90,              # 90% GPU memory utilization
        patience=20,             # Early stopping patience
        cache="ram",             # Cache dataset in RAM
        workers=8,               # Data loading workers
        cos_lr=True,             # Cosine learning rate scheduler
        deterministic=False,
        compile=True,            # torch.compile (10-30% speedup)
        project=str(SCRIPT_DIR / "runs"),
        name="drone_detect",
        exist_ok=True,
        rect=True,               # Rectangular training (16:9)
        save_period=10,          # Save checkpoint every 10 epochs
        # Data augmentation
        degrees=15.0,            # Random rotation ±15°
        flipud=0.5,              # Vertical flip 50% probability
        scale=0.9,               # Scale to 90%
        translate=0.2,           # Translate ±20%
    )

    # Validation metrics
    metrics = model.val(imgsz=960, half=True)
    print(f"Val  — mAP50: {metrics.box.map50:.4f}  mAP50-95: {metrics.box.map:.4f}")

    # Test metrics
    test_metrics = model.val(split="test", imgsz=960, half=True)
    print(f"Test — mAP50: {test_metrics.box.map50:.4f}  mAP50-95: {test_metrics.box.map:.4f}")
2

Start Training

Run the training script:
python train.py
Training will:
  1. Load YOLO26s pretrained weights
  2. Fine-tune on drone dataset for up to 100 epochs
  3. Save checkpoints every 10 epochs
  4. Stop early if validation mAP doesn’t improve for 20 epochs
  5. Run final validation on val and test sets
3

Monitor Progress

Ultralytics automatically logs training metrics. Watch for:
Epoch   GPU_mem   box_loss   cls_loss   dfl_loss   mAP50   mAP50-95
1/100    12.4G     1.234      0.456      1.123     0.612    0.423
...
45/100   12.4G     0.234      0.056      0.323     0.945    0.782
Training results are saved to runs/drone_detect/:
runs/drone_detect/
├── weights/
│   ├── best.pt       # Best model (highest mAP)
│   └── last.pt       # Last epoch
├── results.png       # Training curves
├── confusion_matrix.png
└── args.yaml         # Training arguments

Hyperparameter Explanation

The dataset images are 960×540. Using imgsz=960 with rect=True preserves the aspect ratio and avoids excessive padding. With square inputs (rect=False), ~44% of pixels would be wasted on black padding.
Setting batch to a fraction (0.90 = 90%) tells Ultralytics to automatically select the largest batch size that fits in GPU memory. On A100 this is typically batch 16-32, on RTX 3050 it’s batch 2-4.
Caching the dataset in RAM eliminates disk I/O bottlenecks during training. The 11,387 images (~650MB on disk) require ~4GB RAM when cached. If you have less than 16GB system RAM, use cache="disk" instead.
Enables torch.compile() which optimizes the model graph for 10-30% faster training on PyTorch 2.x + A100. Requires PyTorch ≥2.0.
Preserves image aspect ratio during training instead of forcing square inputs. Critical for 16:9 images to avoid wasting compute on padding.
  • degrees=15.0: Random rotation ±15°
  • flipud=0.5: Vertical flip with 50% probability
  • scale=0.9: Random scale to 90% of original size
  • translate=0.2: Random translation ±20%
These augmentations improve generalization to new drone positions and orientations.

Windows-Specific Configuration

On Windows, Python multiprocessing requires special handling. Change these settings in train.py:
workers=0  # Must be 0 on Windows (single-threaded data loading)
Training will be slower, but cache="ram" compensates by eliminating disk I/O.

Validation & Testing

After training completes, the script automatically runs validation:
# Validation set
metrics = model.val(imgsz=960, half=True)
print(f"Val  — mAP50: {metrics.box.map50:.4f}  mAP50-95: {metrics.box.map:.4f}")

# Test set
test_metrics = model.val(split="test", imgsz=960, half=True)
print(f"Test — mAP50: {test_metrics.box.map50:.4f}  mAP50-95: {test_metrics.box.map:.4f}")
Key metrics:
  • mAP50: Mean average precision at IoU=0.5 (standard COCO metric)
  • mAP50-95: Mean average precision averaged over IoU thresholds 0.5-0.95 (stricter)

Troubleshooting

Reduce batch size or enable FP16:
batch=0.70  # Use 70% of GPU memory instead of 90%
# or
batch=8     # Fixed batch size
Set workers=0 in train.py and ensure the training code is inside if __name__ == "__main__": block.
  • Verify annotations are correct (YOLO format, normalized coordinates)
  • Check class IDs (should be 0 for single-class)
  • Ensure train/val/test splits are representative
  • Try training for more epochs or reducing patience
  • Enable cache="ram" (if you have enough RAM)
  • Use compile=True (requires PyTorch 2.x)
  • Increase workers (Linux/macOS only)
  • Use a GPU instead of CPU
Change cache="ram" to cache="disk" in train.py. This reduces RAM usage but slows down training due to disk I/O.

Next Steps

Once training completes, you’re ready to run detection:

Run Detection

Use your trained model to detect drones in test images

Configuration Reference

Explore all training and detection parameters

Output Format

Learn about the CSV output structure and coordinate system

Known Issues

Common problems and their solutions

Quick Reference

Full training command:
# 1. Install dependencies
pip install -r requirements.txt

# 2. Download pretrained weights
wget https://github.com/ultralytics/assets/releases/download/v8.4.0/yolo26s.pt

# 3. Organize dataset into data/images/{train,validation,test}

# 4. Create data.yaml

# 5. Train
python train.py

# Output: runs/drone_detect/weights/best.pt
Expected timeline (A100):
  • Epoch time: ~2-3 minutes
  • 100 epochs: ~3-5 hours
  • Early stopping typically triggers around epoch 40-60

Build docs developers (and LLMs) love