Skip to main content

Overview

BeamFinder includes a comprehensive model comparison study (study.py) that trains and evaluates all five YOLO26 detection variants on the DeepSense Scenario 23 drone dataset. The study compares:
  • Accuracy - validation and test mAP metrics
  • Speed - inference time breakdown (preprocess + inference + postprocess)
  • Efficiency - training time and GPU memory usage
Results help select the optimal model variant for your deployment constraints.

YOLO26 Model Variants

YOLO26 comes in five sizes, from lightweight (Nano) to maximum accuracy (XLarge):
yolo26n.pt - Smallest and fastest variant
  • Best for: Edge deployment, real-time applications, resource-constrained hardware
  • Tradeoff: Lower accuracy than larger models
  • Typical use case: Mobile devices, embedded systems, high-throughput pipelines

Running the Comparison Study

The study script trains all five models sequentially with identical hyperparameters.

Quick Start

python study.py

Study Configuration

Shared training arguments defined at study.py:41-47:
TRAIN_ARGS = dict(
    data="data.yaml",
    epochs=100,
    imgsz=960,
    batch=0.90,        # 90% GPU memory utilization
    patience=20,
    cache="ram",
    workers=8,
    cos_lr=True,
    deterministic=False,
    rect=True,
    save_period=10,
    compile=True,      # torch.compile for A100 speedup
    degrees=15.0,
    flipud=0.5,
    scale=0.9,
    translate=0.2,
)
All models use identical training settings to ensure fair comparison. Only the model architecture changes.

Crash Recovery

The study supports automatic crash recovery (study.py:76-82):
  • Results are saved to runs/study/results_summary.json after each model completes
  • If the script crashes or is interrupted, re-running python study.py will skip already-completed models
  • Only remaining models will be trained
if RESULTS_FILE.exists():
    results = json.loads(RESULTS_FILE.read_text())
    done = {r["model"] for r in results}
    print(f"Resuming — {len(done)} models already done: {done}")

Metrics Collected

For each model variant, the study collects comprehensive performance metrics (study.py:122-134):

Model Characteristics

model
string
Model name: yolo26n, yolo26s, yolo26m, yolo26l, or yolo26x
params_M
float
Number of trainable parameters in millions.Calculated from sum(p.numel() for p in model.model.parameters()) / 1e6 at study.py:100.

Training Performance

train_time_min
float
Total training time in minutes for 100 epochs (or early stopping).Measured from time.time() before and after model.train() at study.py:102-104.
train_peak_mem_GB
float
Peak GPU memory usage during training in gigabytes.Captured via torch.cuda.max_memory_allocated() / 1e9 at study.py:106.

Accuracy Metrics

val_mAP50
float
Validation set mean Average Precision at IoU threshold 0.50.Range: 0.0 to 1.0 (higher is better).
val_mAP50_95
float
Validation set mean Average Precision averaged over IoU thresholds 0.50 to 0.95.More strict metric than mAP50. Range: 0.0 to 1.0.
test_mAP50
float
Test set mean Average Precision at IoU threshold 0.50.Final metric on held-out test set (1,709 images).
test_mAP50_95
float
Test set mean Average Precision averaged over IoU thresholds 0.50 to 0.95.Primary metric for model comparison.

Inference Speed

Speed breakdown measured in milliseconds per image (study.py:131-133):
preprocess_ms
float
Time spent on image preprocessing (resize, normalize, etc.) in milliseconds.
inference_ms
float
Time spent on neural network forward pass in milliseconds.This is the core detection time.
postprocess_ms
float
Time spent on post-processing (NMS, coordinate conversion) in milliseconds.
Speed measurements are from validation run with half=True (FP16 precision) at imgsz=960.

Results Output

JSON Results File

All metrics are saved to runs/study/results_summary.json in this format:
[
  {
    "model": "yolo26n",
    "params_M": 3.2,
    "train_time_min": 45.3,
    "train_peak_mem_GB": 12.4,
    "val_mAP50": 0.8234,
    "val_mAP50_95": 0.6123,
    "test_mAP50": 0.8156,
    "test_mAP50_95": 0.6045,
    "preprocess_ms": 2.34,
    "inference_ms": 4.12,
    "postprocess_ms": 1.56
  },
  {
    "model": "yolo26s",
    "params_M": 11.2,
    "train_time_min": 68.7,
    "train_peak_mem_GB": 18.9,
    "val_mAP50": 0.8567,
    "val_mAP50_95": 0.6534,
    "test_mAP50": 0.8489,
    "test_mAP50_95": 0.6478,
    "preprocess_ms": 2.45,
    "inference_ms": 6.78,
    "postprocess_ms": 1.67
  }
]

Console Summary Table

The study prints a formatted comparison table (study.py:166-187):
Model      Params   Train   VRAM  Val50   Val95   Test50  Test95    Pre    Inf   Post
             (M)    (min)   (GB)                                    (ms)   (ms)   (ms)
────────────────────────────────────────────────────────────────────────────────────────
Nano          3.2    45.3   12.4  0.8234  0.6123  0.8156  0.6045   2.34   4.12   1.56
Small        11.2    68.7   18.9  0.8567  0.6534  0.8489  0.6478   2.45   6.78   1.67
Medium       25.9    92.4   26.3  0.8723  0.6789  0.8645  0.6712   2.56   9.34   1.78
Large        43.7   124.8   34.1  0.8834  0.6912  0.8756  0.6845   2.67  12.45   1.89
XLarge       68.2   178.3   39.8  0.8912  0.7023  0.8834  0.6934   2.78  16.23   2.01

Visualization Charts

Two sets of charts are automatically generated:

Comparison Charts (runs/study/comparison_charts.png)

Four-panel comparison generated at study.py:193-257:
  1. mAP@50 - Validation vs Test accuracy (bar chart)
  2. mAP@50-95 - Validation vs Test accuracy (bar chart)
  3. Training Time - Total training minutes (bar chart)
  4. Inference Speed Breakdown - Stacked bar chart showing preprocess + inference + postprocess

Efficiency Plots (runs/study/efficiency_plots.png)

Three scatter plots generated at study.py:263-310:
  1. Accuracy vs Model Size - Test mAP@50-95 vs parameters (M)
  2. Accuracy vs Speed - Test mAP@50-95 vs inference time (ms)
  3. Accuracy vs Memory - Test mAP@50-95 vs peak training VRAM (GB)
These plots help identify the Pareto frontier - models with the best accuracy for a given resource constraint.

Interpreting Results

Generally, larger models achieve higher accuracy:
Nano < Small < Medium < Large < XLarge
However, the accuracy gain decreases with each step up:
  • Nano → Small: largest improvement (~3-5% mAP)
  • Small → Medium: moderate improvement (~1-3% mAP)
  • Medium → Large: small improvement (~0.5-1.5% mAP)
  • Large → XLarge: minimal improvement (~0.3-0.8% mAP)

Speed vs Accuracy Tradeoff

Inference time increases with model size:
Nano (4ms) < Small (7ms) < Medium (9ms) < Large (12ms) < XLarge (16ms)
These are approximate times on A100 GPU with FP16. Actual speed depends on hardware.

Memory Requirements

Training VRAM increases significantly with model size:
  • Nano: ~12 GB (fits on RTX 3090)
  • Small: ~19 GB (fits on RTX 3090)
  • Medium: ~26 GB (requires A100 or A6000)
  • Large: ~34 GB (requires A100)
  • XLarge: ~40 GB (requires A100 40GB)
These measurements assume batch=0.90, imgsz=960, and cache="ram". Reduce batch size or image size to fit smaller GPUs.

Model Selection Guidelines

For Real-Time Applications

Choose: Nano or Small
  • Target: >30 FPS on your deployment hardware
  • Nano achieves ~4ms inference (250 FPS on A100)
  • Small achieves ~7ms inference (143 FPS on A100)

For Batch Processing

Choose: Medium or Large
  • Speed is less critical than accuracy
  • Can process images offline in batches
  • Maximize detection quality

For Research & Benchmarking

Choose: XLarge
  • Establish upper bound on achievable accuracy
  • Compare against state-of-the-art methods
  • Not recommended for production

For BeamFinder (Default)

Recommended: Small (yolo26s.pt)
  • Good balance of speed and accuracy
  • Fits comfortably on A100 with room for larger batches
  • Fast enough for near-real-time beam steering
  • High enough accuracy for reliable drone detection

Customizing the Study

Changing Training Parameters

Edit TRAIN_ARGS in study.py:41-47:
TRAIN_ARGS = dict(
    epochs=200,        # More epochs for higher accuracy
    imgsz=1280,        # Larger image size (slower but more accurate)
    batch=0.70,        # Reduce for smaller GPUs
    patience=30,       # More patience for longer training
)

Adding Custom Models

Add model weights to the MODELS list at study.py:34:
MODELS = [
    "yolo26n.pt",
    "yolo26s.pt",
    "yolo26m.pt",
    "yolo26l.pt",
    "yolo26x.pt",
    "custom_model.pt",  # Your custom weights
]

Subset Comparison

To compare only specific variants, comment out others:
MODELS = [
    "yolo26n.pt",
    "yolo26s.pt",
    # "yolo26m.pt",  # Skip Medium
    # "yolo26l.pt",  # Skip Large
    # "yolo26x.pt",  # Skip XLarge
]

Technical Implementation

Weight Pre-Download

The study pre-downloads all model weights before training (study.py:53-61):
def download_all_weights():
    """Download all pretrained weights up front so training isn't stalled."""
    for pt_file in MODELS:
        if not Path(pt_file).exists():
            YOLO(pt_file)  # triggers auto-download
This prevents training interruptions from slow downloads.

Memory Tracking

Peak GPU memory is reset before each model (study.py:97):
torch.cuda.reset_peak_memory_stats()
model = YOLO(pt_file)
model.train(**TRAIN_ARGS)
train_peak_mem_gb = torch.cuda.max_memory_allocated() / 1e9

Best Weights Evaluation

The study loads the best checkpoint (lowest validation loss) for final evaluation (study.py:109-111):
best_pt = run_dir / "weights" / "best.pt"
if best_pt.exists():
    model = YOLO(str(best_pt))
This ensures metrics reflect the optimal model state, not the final epoch.

Study Duration

Approximate total time on A100 40GB:
  • Nano: 45 min
  • Small: 70 min
  • Medium: 90 min
  • Large: 125 min
  • XLarge: 180 min
Total: ~8.5 hours for all five models
Actual time varies based on dataset size, early stopping, and hardware. The study prints progress after each model completes.

Example Results Interpretation

Suppose you run the study and get these results:
ModelTest mAP50-95Inference (ms)VRAM (GB)
Nano0.60454.1212.4
Small0.64786.7818.9
Medium0.67129.3426.3
Large0.684512.4534.1
XLarge0.693416.2339.8
Analysis:
  1. Nano → Small: +4.3% mAP, only +2.66ms slower → strong value
  2. Small → Medium: +2.3% mAP, +2.56ms slower → diminishing returns
  3. Medium → XLarge: +2.2% mAP, +6.89ms slower → not worth it
Recommendation: Use Small for most applications. Upgrade to Medium only if you need the extra 2% accuracy and have VRAM budget.

Build docs developers (and LLMs) love