Overview
BeamFinder includes a comprehensive model comparison study (study.py) that trains and evaluates all five YOLO26 detection variants on the DeepSense Scenario 23 drone dataset.
The study compares:
- Accuracy - validation and test mAP metrics
- Speed - inference time breakdown (preprocess + inference + postprocess)
- Efficiency - training time and GPU memory usage
YOLO26 Model Variants
YOLO26 comes in five sizes, from lightweight (Nano) to maximum accuracy (XLarge):- Nano (n)
- Small (s)
- Medium (m)
- Large (l)
- XLarge (x)
yolo26n.pt - Smallest and fastest variant
- Best for: Edge deployment, real-time applications, resource-constrained hardware
- Tradeoff: Lower accuracy than larger models
- Typical use case: Mobile devices, embedded systems, high-throughput pipelines
Running the Comparison Study
The study script trains all five models sequentially with identical hyperparameters.Quick Start
Study Configuration
Shared training arguments defined atstudy.py:41-47:
All models use identical training settings to ensure fair comparison. Only the model architecture changes.
Crash Recovery
The study supports automatic crash recovery (study.py:76-82):
- Results are saved to
runs/study/results_summary.jsonafter each model completes - If the script crashes or is interrupted, re-running
python study.pywill skip already-completed models - Only remaining models will be trained
Metrics Collected
For each model variant, the study collects comprehensive performance metrics (study.py:122-134):
Model Characteristics
Model name:
yolo26n, yolo26s, yolo26m, yolo26l, or yolo26xNumber of trainable parameters in millions.Calculated from
sum(p.numel() for p in model.model.parameters()) / 1e6 at study.py:100.Training Performance
Total training time in minutes for 100 epochs (or early stopping).Measured from
time.time() before and after model.train() at study.py:102-104.Peak GPU memory usage during training in gigabytes.Captured via
torch.cuda.max_memory_allocated() / 1e9 at study.py:106.Accuracy Metrics
Validation set mean Average Precision at IoU threshold 0.50.Range: 0.0 to 1.0 (higher is better).
Validation set mean Average Precision averaged over IoU thresholds 0.50 to 0.95.More strict metric than mAP50. Range: 0.0 to 1.0.
Test set mean Average Precision at IoU threshold 0.50.Final metric on held-out test set (1,709 images).
Test set mean Average Precision averaged over IoU thresholds 0.50 to 0.95.Primary metric for model comparison.
Inference Speed
Speed breakdown measured in milliseconds per image (study.py:131-133):
Time spent on image preprocessing (resize, normalize, etc.) in milliseconds.
Time spent on neural network forward pass in milliseconds.This is the core detection time.
Time spent on post-processing (NMS, coordinate conversion) in milliseconds.
Speed measurements are from validation run with
half=True (FP16 precision) at imgsz=960.Results Output
JSON Results File
All metrics are saved toruns/study/results_summary.json in this format:
Console Summary Table
The study prints a formatted comparison table (study.py:166-187):
Visualization Charts
Two sets of charts are automatically generated:Comparison Charts (runs/study/comparison_charts.png)
Four-panel comparison generated at study.py:193-257:
- mAP@50 - Validation vs Test accuracy (bar chart)
- mAP@50-95 - Validation vs Test accuracy (bar chart)
- Training Time - Total training minutes (bar chart)
- Inference Speed Breakdown - Stacked bar chart showing preprocess + inference + postprocess
Efficiency Plots (runs/study/efficiency_plots.png)
Three scatter plots generated at study.py:263-310:
- Accuracy vs Model Size - Test mAP@50-95 vs parameters (M)
- Accuracy vs Speed - Test mAP@50-95 vs inference time (ms)
- Accuracy vs Memory - Test mAP@50-95 vs peak training VRAM (GB)
Interpreting Results
Accuracy Trends
Generally, larger models achieve higher accuracy:- Nano → Small: largest improvement (~3-5% mAP)
- Small → Medium: moderate improvement (~1-3% mAP)
- Medium → Large: small improvement (~0.5-1.5% mAP)
- Large → XLarge: minimal improvement (~0.3-0.8% mAP)
Speed vs Accuracy Tradeoff
Inference time increases with model size:Memory Requirements
Training VRAM increases significantly with model size:- Nano: ~12 GB (fits on RTX 3090)
- Small: ~19 GB (fits on RTX 3090)
- Medium: ~26 GB (requires A100 or A6000)
- Large: ~34 GB (requires A100)
- XLarge: ~40 GB (requires A100 40GB)
These measurements assume
batch=0.90, imgsz=960, and cache="ram". Reduce batch size or image size to fit smaller GPUs.Model Selection Guidelines
For Real-Time Applications
Choose: Nano or Small- Target: >30 FPS on your deployment hardware
- Nano achieves ~4ms inference (250 FPS on A100)
- Small achieves ~7ms inference (143 FPS on A100)
For Batch Processing
Choose: Medium or Large- Speed is less critical than accuracy
- Can process images offline in batches
- Maximize detection quality
For Research & Benchmarking
Choose: XLarge- Establish upper bound on achievable accuracy
- Compare against state-of-the-art methods
- Not recommended for production
For BeamFinder (Default)
Recommended: Small (yolo26s.pt)- Good balance of speed and accuracy
- Fits comfortably on A100 with room for larger batches
- Fast enough for near-real-time beam steering
- High enough accuracy for reliable drone detection
Customizing the Study
Changing Training Parameters
EditTRAIN_ARGS in study.py:41-47:
Adding Custom Models
Add model weights to theMODELS list at study.py:34:
Subset Comparison
To compare only specific variants, comment out others:Technical Implementation
Weight Pre-Download
The study pre-downloads all model weights before training (study.py:53-61):
Memory Tracking
Peak GPU memory is reset before each model (study.py:97):
Best Weights Evaluation
The study loads the best checkpoint (lowest validation loss) for final evaluation (study.py:109-111):
Study Duration
Approximate total time on A100 40GB:- Nano: 45 min
- Small: 70 min
- Medium: 90 min
- Large: 125 min
- XLarge: 180 min
Actual time varies based on dataset size, early stopping, and hardware. The study prints progress after each model completes.
Example Results Interpretation
Suppose you run the study and get these results:| Model | Test mAP50-95 | Inference (ms) | VRAM (GB) |
|---|---|---|---|
| Nano | 0.6045 | 4.12 | 12.4 |
| Small | 0.6478 | 6.78 | 18.9 |
| Medium | 0.6712 | 9.34 | 26.3 |
| Large | 0.6845 | 12.45 | 34.1 |
| XLarge | 0.6934 | 16.23 | 39.8 |
- Nano → Small: +4.3% mAP, only +2.66ms slower → strong value
- Small → Medium: +2.3% mAP, +2.56ms slower → diminishing returns
- Medium → XLarge: +2.2% mAP, +6.89ms slower → not worth it