Skip to main content
The detect.py script runs inference on test images and exports bounding box detections to a CSV file for downstream THz beam steering applications.

Prerequisites

1

Train the Model

You need a trained model checkpoint. Follow the Training Guide to train YOLO26s on the drone dataset.The detection script expects the best checkpoint at:
runs/drone_detect/weights/best.pt
2

Prepare Test Images

Test images should be in the data/images/test/ directory. If you followed the Dataset Setup guide, this is already configured.

Running Detection

Basic Usage

python detect.py
The script will:
  1. Load the trained model from runs/drone_detect/weights/best.pt
  2. Run inference on all images in data/images/test/
  3. Save detections to output/detections.csv
  4. Save annotated images to output/annotated/

Expected Output

1247 detections saved to output/detections.csv

Configuration

The detection configuration is defined at the top of detect.py:
from pathlib import Path
from ultralytics import YOLO

SCRIPT_DIR = Path(__file__).resolve().parent
MODEL = str(SCRIPT_DIR / "runs" / "drone_detect" / "weights" / "best.pt")
IMAGE_DIR = SCRIPT_DIR / "data" / "images" / "test"
OUTPUT_DIR = SCRIPT_DIR / "output"
CONF = 0.4
IMGSZ = 960

Configuration Parameters

ParameterDefaultDescription
MODELruns/drone_detect/weights/best.ptPath to trained model checkpoint
IMAGE_DIRdata/images/testDirectory containing test images
OUTPUT_DIRoutputDirectory for results (CSV + annotated images)
CONF0.4Confidence threshold (0-1). Only detections above this score are kept
IMGSZ960Input image size (must match training)
To use a different model checkpoint (e.g., last.pt or a specific epoch), modify the MODEL variable:
MODEL = str(SCRIPT_DIR / "runs" / "drone_detect" / "weights" / "last.pt")

Inference Parameters

The prediction call uses optimized settings for A100 GPUs:
results = model.predict(
    source=str(IMAGE_DIR),
    conf=CONF,
    imgsz=IMGSZ,
    save=True,
    project=str(OUTPUT_DIR),
    name="annotated",
    exist_ok=True,
    half=True,
    batch=16,
)

Parameter Reference

ParameterValueDescription
sourcedata/images/testInput image directory
conf0.4Confidence threshold for filtering detections
imgsz960Image size for inference (height)
saveTrueSave annotated images with bounding boxes drawn
projectoutputProject directory for saving results
nameannotatedSubdirectory name for annotated images
exist_okTrueOverwrite existing output directory
halfTrueUse FP16 (half precision) for 2× faster inference on GPU
batch16Process 16 images per batch (adjust based on GPU memory)
FP16 Inference: half=True uses half-precision floating point (FP16) which is 2× faster on modern GPUs with minimal accuracy loss. Requires a GPU with FP16 support (Pascal architecture or newer).

Output Format

CSV Structure

Detections are saved to output/detections.csv with the following columns:
ColumnTypeDescriptionExample
imagestringSource image filenameimage_BS1_1234_17_56_02.jpg
x_centerfloatBounding box center X coordinate (pixels)519.23
y_centerfloatBounding box center Y coordinate (pixels)329.19
widthfloatBounding box width (pixels)45.0
heightfloatBounding box height (pixels)41.04
confidencefloatDetection confidence score (0-1)0.9234
classstringObject class namedrone
image,x_center,y_center,width,height,confidence,class
image_BS1_9998_17_56_02.jpg,519.23,329.19,45.0,41.04,0.9234,drone
image_BS1_9992_17_56_01.jpg,487.56,318.72,48.12,43.87,0.8876,drone
image_BS1_9967_17_55_57.jpg,502.89,335.41,44.23,40.15,0.9512,drone

Bounding Box Format

The output uses center-based coordinates (x_center, y_center, width, height) in pixels. This matches the YOLO internal format.
If you need top-left corner coordinates (x1, y1, x2, y2), use:
x1 = x_center - width / 2
y1 = y_center - height / 2
x2 = x_center + width / 2
y2 = y_center + height / 2

Annotated Images

Annotated images with bounding boxes drawn are saved to output/annotated/:
output/
├── detections.csv
└── annotated/
    ├── image_BS1_9998_17_56_02.jpg
    ├── image_BS1_9992_17_56_01.jpg
    └── ...
Each annotated image shows:
  • Green bounding boxes around detected drones
  • Confidence scores above each box
  • Class label (“drone”)

Processing Detections

The script extracts bounding box coordinates from YOLO results and writes them to CSV:
import csv

with open(csv_path, "w", newline="", encoding="utf-8") as f:
    writer = csv.writer(f)
    writer.writerow(["image", "x_center", "y_center", "width", "height", "confidence", "class"])

    results = model.predict(
        source=str(IMAGE_DIR), conf=CONF, imgsz=IMGSZ,
        save=True, project=str(OUTPUT_DIR), name="annotated",
        exist_ok=True, half=True, batch=16,
    )
    
    for r in results:
        name = Path(r.path).name
        if r.boxes is not None and len(r.boxes):
            for box in r.boxes:
                cx, cy, w, h = box.xywh[0].tolist()
                writer.writerow([
                    name,
                    round(cx, 2),
                    round(cy, 2),
                    round(w, 2),
                    round(h, 2),
                    round(box.conf.item(), 4),
                    r.names[int(box.cls.item())]
                ])

Accessing Detection Results

Each result object contains:
AttributeDescription
r.pathSource image path
r.boxesTensor of detected bounding boxes (None if no detections)
r.boxes.xywhBounding boxes in (x_center, y_center, width, height) format
r.boxes.xyxyBounding boxes in (x1, y1, x2, y2) format
r.boxes.confConfidence scores
r.boxes.clsClass indices
r.namesDictionary mapping class indices to names

GPU Optimizations

Like the training script, detection applies A100-specific optimizations:
import torch

# A100: maximize GPU throughput
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.allow_tf32 = True
torch.backends.cudnn.benchmark = True
These flags enable TensorFloat-32 and auto-tuned convolutions for faster inference.

Adjusting Confidence Threshold

The confidence threshold (CONF = 0.4) controls the precision/recall tradeoff:
  • Lower threshold (e.g., 0.2): More detections, higher recall, but more false positives
  • Higher threshold (e.g., 0.6): Fewer detections, higher precision, but may miss some drones
  • Default (0.4): Balanced setting that works well for most scenarios
To experiment with different thresholds, modify the CONF variable in detect.py:12:
CONF = 0.6  # More conservative detections

Batch Size Tuning

The default batch size is 16 images. Adjust based on GPU memory:
GPU VRAMRecommended Batch Size
4 GB4-8
8 GB8-16
12+ GB16-32
40 GB (A100)32-64
Modify batch in the predict call:
results = model.predict(
    source=str(IMAGE_DIR),
    conf=CONF,
    imgsz=IMGSZ,
    batch=32,  # Increase for larger GPUs
    half=True,
)
If you get CUDA out of memory errors during inference, reduce the batch size.

Using Detections for Beam Steering

The CSV output is designed for THz beam steering applications. Each detection provides:
  1. Spatial coordinates: (x_center, y_center) for pointing the beam
  2. Drone size: (width, height) for estimating distance or filtering by target size
  3. Confidence: For filtering low-quality detections
Example workflow:
import pandas as pd

# Load detections
df = pd.read_csv("output/detections.csv")

# Filter high-confidence detections
df = df[df['confidence'] > 0.7]

# Get beam steering coordinates for each detection
for _, row in df.iterrows():
    x, y = row['x_center'], row['y_center']
    # Send (x, y) to beam steering controller
    steer_beam(x, y)

Troubleshooting

Problem: FileNotFoundError: runs/drone_detect/weights/best.ptSolution: Train the model first using python train.py. The detection script requires a trained checkpoint.
Possible Causes:
  • Confidence threshold too high (try lowering CONF from 0.4 to 0.2)
  • Model not trained properly (check validation mAP in training output)
  • Test images don’t contain drones
  • Wrong model checkpoint (verify you’re using best.pt, not untrained weights)
Solutions:
  • Reduce batch from 16 to 8 or 4
  • Disable half=True (slower but uses less memory)
  • Process images one at a time with batch=1
Problem: Inference takes several seconds per imageSolution: Ensure you have a CUDA-capable GPU and PyTorch with CUDA support installed. Check with:
import torch
print(torch.cuda.is_available())  # Should print True

Next Steps

  • Integrate detections with your THz beam steering controller
  • Experiment with different confidence thresholds
  • Run detection on live video streams using Ultralytics’ video inference mode

References

Build docs developers (and LLMs) love