YOLO26 Model Comparison

Overview

BeamFinder includes a comprehensive model comparison study (study.py) that trains and evaluates all five YOLO26 detection variants on the DeepSense Scenario 23 drone dataset. The study compares:

Accuracy - validation and test mAP metrics
Speed - inference time breakdown (preprocess + inference + postprocess)
Efficiency - training time and GPU memory usage

Results help select the optimal model variant for your deployment constraints.

YOLO26 Model Variants

YOLO26 comes in five sizes, from lightweight (Nano) to maximum accuracy (XLarge):

Nano (n)
Small (s)
Medium (m)
Large (l)
XLarge (x)

yolo26n.pt - Smallest and fastest variant

Best for: Edge deployment, real-time applications, resource-constrained hardware
Tradeoff: Lower accuracy than larger models
Typical use case: Mobile devices, embedded systems, high-throughput pipelines

Running the Comparison Study

The study script trains all five models sequentially with identical hyperparameters.

Quick Start

python study.py

Study Configuration

Shared training arguments defined at study.py:41-47:

TRAIN_ARGS = dict(
    data="data.yaml",
    epochs=100,
    imgsz=960,
    batch=0.90,        # 90% GPU memory utilization
    patience=20,
    cache="ram",
    workers=8,
    cos_lr=True,
    deterministic=False,
    rect=True,
    save_period=10,
    compile=True,      # torch.compile for A100 speedup
    degrees=15.0,
    flipud=0.5,
    scale=0.9,
    translate=0.2,
)

All models use identical training settings to ensure fair comparison. Only the model architecture changes.

Crash Recovery

The study supports automatic crash recovery (study.py:76-82):

Results are saved to runs/study/results_summary.json after each model completes
If the script crashes or is interrupted, re-running python study.py will skip already-completed models
Only remaining models will be trained

if RESULTS_FILE.exists():
    results = json.loads(RESULTS_FILE.read_text())
    done = {r["model"] for r in results}
    print(f"Resuming — {len(done)} models already done: {done}")

Metrics Collected

For each model variant, the study collects comprehensive performance metrics (study.py:122-134):

Model Characteristics

model

string

Model name: yolo26n, yolo26s, yolo26m, yolo26l, or yolo26x

params_M

float

Number of trainable parameters in millions.Calculated from sum(p.numel() for p in model.model.parameters()) / 1e6 at study.py:100.

Training Performance

train_time_min

float

Total training time in minutes for 100 epochs (or early stopping).Measured from time.time() before and after model.train() at study.py:102-104.

train_peak_mem_GB

float

Peak GPU memory usage during training in gigabytes.Captured via torch.cuda.max_memory_allocated() / 1e9 at study.py:106.

Accuracy Metrics

val_mAP50

float

Validation set mean Average Precision at IoU threshold 0.50.Range: 0.0 to 1.0 (higher is better).

val_mAP50_95

float

Validation set mean Average Precision averaged over IoU thresholds 0.50 to 0.95.More strict metric than mAP50. Range: 0.0 to 1.0.

test_mAP50

float

Test set mean Average Precision at IoU threshold 0.50.Final metric on held-out test set (1,709 images).

test_mAP50_95

float

Test set mean Average Precision averaged over IoU thresholds 0.50 to 0.95.Primary metric for model comparison.

Inference Speed

Speed breakdown measured in milliseconds per image (study.py:131-133):

preprocess_ms

float

Time spent on image preprocessing (resize, normalize, etc.) in milliseconds.

inference_ms

float

Time spent on neural network forward pass in milliseconds.This is the core detection time.

postprocess_ms

float

Time spent on post-processing (NMS, coordinate conversion) in milliseconds.

Speed measurements are from validation run with half=True (FP16 precision) at imgsz=960.

Results Output

JSON Results File

All metrics are saved to runs/study/results_summary.json in this format:

[
  {
    "model": "yolo26n",
    "params_M": 3.2,
    "train_time_min": 45.3,
    "train_peak_mem_GB": 12.4,
    "val_mAP50": 0.8234,
    "val_mAP50_95": 0.6123,
    "test_mAP50": 0.8156,
    "test_mAP50_95": 0.6045,
    "preprocess_ms": 2.34,
    "inference_ms": 4.12,
    "postprocess_ms": 1.56
  },
  {
    "model": "yolo26s",
    "params_M": 11.2,
    "train_time_min": 68.7,
    "train_peak_mem_GB": 18.9,
    "val_mAP50": 0.8567,
    "val_mAP50_95": 0.6534,
    "test_mAP50": 0.8489,
    "test_mAP50_95": 0.6478,
    "preprocess_ms": 2.45,
    "inference_ms": 6.78,
    "postprocess_ms": 1.67
  }
]

Console Summary Table

The study prints a formatted comparison table (study.py:166-187):

Model      Params   Train   VRAM  Val50   Val95   Test50  Test95    Pre    Inf   Post
             (M)    (min)   (GB)                                    (ms)   (ms)   (ms)
────────────────────────────────────────────────────────────────────────────────────────
Nano          3.2    45.3   12.4  0.8234  0.6123  0.8156  0.6045   2.34   4.12   1.56
Small        11.2    68.7   18.9  0.8567  0.6534  0.8489  0.6478   2.45   6.78   1.67
Medium       25.9    92.4   26.3  0.8723  0.6789  0.8645  0.6712   2.56   9.34   1.78
Large        43.7   124.8   34.1  0.8834  0.6912  0.8756  0.6845   2.67  12.45   1.89
XLarge       68.2   178.3   39.8  0.8912  0.7023  0.8834  0.6934   2.78  16.23   2.01

Visualization Charts

Two sets of charts are automatically generated:

Comparison Charts (`runs/study/comparison_charts.png`)

Four-panel comparison generated at study.py:193-257:

mAP@50 - Validation vs Test accuracy (bar chart)
mAP@50-95 - Validation vs Test accuracy (bar chart)
Training Time - Total training minutes (bar chart)
Inference Speed Breakdown - Stacked bar chart showing preprocess + inference + postprocess

Efficiency Plots (`runs/study/efficiency_plots.png`)

Three scatter plots generated at study.py:263-310:

Accuracy vs Model Size - Test mAP@50-95 vs parameters (M)
Accuracy vs Speed - Test mAP@50-95 vs inference time (ms)
Accuracy vs Memory - Test mAP@50-95 vs peak training VRAM (GB)

These plots help identify the Pareto frontier - models with the best accuracy for a given resource constraint.

Interpreting Results

Accuracy Trends

Generally, larger models achieve higher accuracy:

Nano < Small < Medium < Large < XLarge

However, the accuracy gain decreases with each step up:

Nano → Small: largest improvement (~3-5% mAP)
Small → Medium: moderate improvement (~1-3% mAP)
Medium → Large: small improvement (~0.5-1.5% mAP)
Large → XLarge: minimal improvement (~0.3-0.8% mAP)

Speed vs Accuracy Tradeoff

Inference time increases with model size:

Nano (4ms) < Small (7ms) < Medium (9ms) < Large (12ms) < XLarge (16ms)

These are approximate times on A100 GPU with FP16. Actual speed depends on hardware.

Memory Requirements

Training VRAM increases significantly with model size:

Nano: ~12 GB (fits on RTX 3090)
Small: ~19 GB (fits on RTX 3090)
Medium: ~26 GB (requires A100 or A6000)
Large: ~34 GB (requires A100)
XLarge: ~40 GB (requires A100 40GB)

These measurements assume batch=0.90, imgsz=960, and cache="ram". Reduce batch size or image size to fit smaller GPUs.

Model Selection Guidelines

For Real-Time Applications

Choose: Nano or Small

Target: >30 FPS on your deployment hardware
Nano achieves ~4ms inference (250 FPS on A100)
Small achieves ~7ms inference (143 FPS on A100)

For Batch Processing

Choose: Medium or Large

Speed is less critical than accuracy
Can process images offline in batches
Maximize detection quality

For Research & Benchmarking

Choose: XLarge

Establish upper bound on achievable accuracy
Compare against state-of-the-art methods
Not recommended for production

For BeamFinder (Default)

Recommended: Small (yolo26s.pt)

Good balance of speed and accuracy
Fits comfortably on A100 with room for larger batches
Fast enough for near-real-time beam steering
High enough accuracy for reliable drone detection

Customizing the Study

Changing Training Parameters

Edit TRAIN_ARGS in study.py:41-47:

TRAIN_ARGS = dict(
    epochs=200,        # More epochs for higher accuracy
    imgsz=1280,        # Larger image size (slower but more accurate)
    batch=0.70,        # Reduce for smaller GPUs
    patience=30,       # More patience for longer training
)

Adding Custom Models

Add model weights to the MODELS list at study.py:34:

MODELS = [
    "yolo26n.pt",
    "yolo26s.pt",
    "yolo26m.pt",
    "yolo26l.pt",
    "yolo26x.pt",
    "custom_model.pt",  # Your custom weights
]

Subset Comparison

To compare only specific variants, comment out others:

MODELS = [
    "yolo26n.pt",
    "yolo26s.pt",
    # "yolo26m.pt",  # Skip Medium
    # "yolo26l.pt",  # Skip Large
    # "yolo26x.pt",  # Skip XLarge
]

Technical Implementation

Weight Pre-Download

The study pre-downloads all model weights before training (study.py:53-61):

def download_all_weights():
    """Download all pretrained weights up front so training isn't stalled."""
    for pt_file in MODELS:
        if not Path(pt_file).exists():
            YOLO(pt_file)  # triggers auto-download

This prevents training interruptions from slow downloads.

Memory Tracking

Peak GPU memory is reset before each model (study.py:97):

torch.cuda.reset_peak_memory_stats()
model = YOLO(pt_file)
model.train(**TRAIN_ARGS)
train_peak_mem_gb = torch.cuda.max_memory_allocated() / 1e9

Best Weights Evaluation

The study loads the best checkpoint (lowest validation loss) for final evaluation (study.py:109-111):

best_pt = run_dir / "weights" / "best.pt"
if best_pt.exists():
    model = YOLO(str(best_pt))

This ensures metrics reflect the optimal model state, not the final epoch.

Study Duration

Approximate total time on A100 40GB:

Nano: 45 min
Small: 70 min
Medium: 90 min
Large: 125 min
XLarge: 180 min

Total: ~8.5 hours for all five models

Actual time varies based on dataset size, early stopping, and hardware. The study prints progress after each model completes.

Example Results Interpretation

Suppose you run the study and get these results:

Model	Test mAP50-95	Inference (ms)	VRAM (GB)
Nano	0.6045	4.12	12.4
Small	0.6478	6.78	18.9
Medium	0.6712	9.34	26.3
Large	0.6845	12.45	34.1
XLarge	0.6934	16.23	39.8

Analysis:

Nano → Small: +4.3% mAP, only +2.66ms slower → strong value
Small → Medium: +2.3% mAP, +2.56ms slower → diminishing returns
Medium → XLarge: +2.2% mAP, +6.89ms slower → not worth it

Recommendation: Use Small for most applications. Upgrade to Medium only if you need the extra 2% accuracy and have VRAM budget.

Get Started

Guides

Reference

Resources

​Overview

​YOLO26 Model Variants

​Running the Comparison Study

​Quick Start

​Study Configuration

​Crash Recovery

​Metrics Collected

​Model Characteristics

​Training Performance

​Accuracy Metrics

​Inference Speed

​Results Output

​JSON Results File

​Console Summary Table

​Visualization Charts

​Comparison Charts (runs/study/comparison_charts.png)

​Efficiency Plots (runs/study/efficiency_plots.png)

​Interpreting Results

​Accuracy Trends

​Speed vs Accuracy Tradeoff

​Memory Requirements

​Model Selection Guidelines

​For Real-Time Applications

​For Batch Processing

​For Research & Benchmarking

​For BeamFinder (Default)

​Customizing the Study

​Changing Training Parameters

​Adding Custom Models

​Subset Comparison

​Technical Implementation

​Weight Pre-Download

​Memory Tracking

​Best Weights Evaluation

​Study Duration

​Example Results Interpretation

Build docs developers (and LLMs) love

Overview

YOLO26 Model Variants

Running the Comparison Study

Quick Start

Study Configuration

Crash Recovery

Metrics Collected

Model Characteristics

Training Performance

Accuracy Metrics

Inference Speed

Results Output

JSON Results File

Console Summary Table

Visualization Charts

Comparison Charts (`runs/study/comparison_charts.png`)

Efficiency Plots (`runs/study/efficiency_plots.png`)

Interpreting Results

Accuracy Trends

Speed vs Accuracy Tradeoff

Memory Requirements

Model Selection Guidelines

For Real-Time Applications

For Batch Processing

For Research & Benchmarking

For BeamFinder (Default)

Customizing the Study

Changing Training Parameters

Adding Custom Models

Subset Comparison

Technical Implementation

Weight Pre-Download

Memory Tracking

Best Weights Evaluation

Study Duration

Example Results Interpretation