Troubleshooting

Installation Issues

ImportError: No module named 'ultralytics'

Problem: Python cannot find the Ultralytics package after installation.Solution:

pip install -r requirements.txt

BeamFinder requires ultralytics>=8.4.0. If you’re using a virtual environment, make sure it’s activated before installing.Verify installation:

python -c "import ultralytics; print(ultralytics.__version__)"

ModuleNotFoundError: No module named 'torch'

Problem: PyTorch is not installed.Solution:The requirements.txt assumes PyTorch is already installed (this is the case on Lightning.ai A100 instances). Install PyTorch manually:

# CPU only
pip install torch torchvision

# CUDA 11.8
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

# CUDA 12.1
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

See PyTorch installation guide for your specific setup.

GPU & CUDA Issues

RuntimeError: CUDA out of memory

Problem: Training crashes with CUDA out of memory error.Solution:BeamFinder uses batch=0.90 (90% GPU memory utilization) by default in train.py:20. If you’re running other processes on the GPU, reduce this:

model.train(
    batch=0.70,  # Use 70% instead of 90%
    # ... other args
)

Or disable automatic mixed precision (uses more VRAM but might help with fragmentation):

model.train(
    amp=False,  # Disable FP16
    # ... other args
)

The RTX 3050 (4GB VRAM) can handle batch=0.85 with imgsz=960 and amp=True, resulting in batch size 2-4.

CUDA not available / training on CPU

Problem: Training is extremely slow because PyTorch is using CPU instead of GPU.Check if CUDA is available:

python -c "import torch; print(torch.cuda.is_available())"
python -c "import torch; print(torch.version.cuda)"

Solution:

Verify NVIDIA driver installation:
```
nvidia-smi
```
Should show GPU info and CUDA version.

Reinstall PyTorch with CUDA support:

pip uninstall torch torchvision
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

Check CUDA toolkit version compatibility: PyTorch CUDA version must match your NVIDIA driver’s supported CUDA version (shown in nvidia-smi).

torch.cuda.OutOfMemoryError during inference

Problem: Detection (detect.py) crashes with out-of-memory error.Solution:Reduce batch size in detect.py:37:

results = model.predict(
    source=str(IMAGE_DIR),
    batch=4,  # Reduce from 16 to 4 or 1
    # ... other args
)

The test set has 1,709 images. With batch=16 and half=True, inference needs ~2GB VRAM at imgsz=960.

Dataset Issues

FileNotFoundError: data/images/train not found

Problem: Training fails because the dataset directory structure is incorrect.Solution:BeamFinder expects this exact directory structure (defined in data.yaml:1-4):

BeamFinder/
└── data/
    ├── images/
    │   ├── train/          # 7,970 images
    │   ├── validation/     # 1,708 images
    │   └── test/           # 1,709 images
    └── labels/
        ├── train/          # 7,970 .txt files
        ├── validation/     # 1,708 .txt files
        └── test/           # 1,709 .txt files

Each .txt file in labels/ must have the same name as its corresponding image in images/ (e.g., IMG_001.jpg → IMG_001.txt).

ValueError: No labels found in dataset

Problem: YOLO can’t find annotation files during training.Solution:

Check label format: Each .txt file should contain bounding boxes in YOLO format:
```
0 0.5 0.5 0.3 0.2
```
Format: class_id x_center y_center width height (normalized 0-1)

Verify label files exist:

ls data/labels/train/ | wc -l  # Should show 7,970

Check data.yaml path: The path: data in data.yaml:1 is relative to the script location. If you moved data.yaml, update the path to an absolute path:
```
path: /absolute/path/to/BeamFinder/data
```

WARNING: Image size mismatch

Problem: YOLO reports image size warnings during training.Solution:This is usually harmless. BeamFinder images are 960×540 (16:9 aspect ratio). YOLO resizes them to imgsz=960 with rect=True to preserve aspect ratio.If you want to eliminate warnings entirely, ensure all images are exactly 960×540:

# Check image sizes
python -c "from PIL import Image; import sys; img = Image.open(sys.argv[1]); print(img.size)" data/images/train/IMG_001.jpg

The rect=True parameter in train.py:30 avoids wasting 44% of pixels on black padding for 16:9 images.

Training Issues

RuntimeError: Dataloader workers multiprocessing error (Windows)

Problem: Training crashes on Windows with multiprocessing errors.Solution:This is a known Windows limitation. Set workers=0 in train.py:23:

model.train(
    workers=0,  # Required on Windows
    # ... other args
)

Also ensure your training script uses if __name__ == "__main__": (already present in train.py:8).

Training will be slower with workers=0 (single-threaded data loading). Use cache="ram" to compensate.

See the Known Issues page for details on Windows multiprocessing.

Training is very slow

Problem: Training takes much longer than expected.Checklist:

GPU is being used:

watch -n 1 nvidia-smi  # Monitor GPU utilization during training

GPU utilization should be 80-100% during training.

Enable dataset caching:

model.train(
    cache="ram",  # Cache in RAM (needs ~4GB system memory)
    # Or:
    cache="disk",  # Cache to disk (slower but less RAM)
)

Use mixed precision (FP16): BeamFinder uses amp=True by default, but verify:
```
model.train(
    amp=True,  # Automatic Mixed Precision
)
```
Enable torch.compile (PyTorch 2.x + CUDA):
```
model.train(
    compile=True,  # 10-30% faster on A100
)
```
Already enabled in train.py:26 for A100 training.

Increase worker count (Linux/Mac only):

model.train(
    workers=8,  # Parallel data loading
)

OSError: [Errno 28] No space left on device

Problem: Disk runs out of space during training.Solution:Training generates checkpoints every 10 epochs (save_period=10 in train.py:31). These accumulate in runs/drone_detect/weights/:

# Remove intermediate checkpoints, keep only best.pt and last.pt
rm runs/drone_detect/weights/epoch*.pt

Or reduce checkpoint frequency:

model.train(
    save_period=25,  # Save every 25 epochs instead of 10
)

Or disable intermediate checkpoints entirely:

model.train(
    save_period=-1,  # Only save best.pt and last.pt
)

mAP is very low (< 0.5)

Problem: Model accuracy on validation set is unexpectedly low.Possible causes:

Labels are incorrect or missing: Verify a few annotation files manually:
```
cat data/labels/train/IMG_001.txt
```
Class ID mismatch: BeamFinder uses a single class (nc: 1 in data.yaml:6). All labels should have class_id = 0:
```
0 0.5 0.5 0.3 0.2
```
Training hasn’t converged: 100 epochs with patience=20 should be sufficient, but you can train longer:
```
model.train(
    epochs=200,
    patience=40,
)
```
Model is too small: Try a larger YOLO variant (yolo26m.pt or yolo26l.pt instead of yolo26s.pt).

Detection Issues

FileNotFoundError: runs/drone_detect/weights/best.pt not found

Problem: Detection script can’t find the trained model weights.Solution:You need to train the model first:

python train.py

This creates runs/drone_detect/weights/best.pt. Detection script loads this model at detect.py:9:

MODEL = str(SCRIPT_DIR / "runs" / "drone_detect" / "weights" / "best.pt")

If you moved the weights file, update the MODEL path in detect.py.

No detections in output CSV

Problem: output/detections.csv only contains headers, no bounding boxes.Solution:

Check confidence threshold: BeamFinder uses CONF=0.4 in detect.py:12. Lower it to detect less confident predictions:
```
CONF = 0.25  # Reduce from 0.4 to 0.25
```

Verify test images exist:

ls data/images/test/ | wc -l  # Should show 1,709

Check model performance: Run validation to see if the model learned anything:

python -c "from ultralytics import YOLO; m = YOLO('runs/drone_detect/weights/best.pt'); print(m.val(data='data.yaml', split='test'))"

Verify image directory: Make sure IMAGE_DIR in detect.py:10 points to the correct folder:
```
IMAGE_DIR = SCRIPT_DIR / "data" / "images" / "test"
```

Detection is very slow

Problem: Processing 1,709 test images takes too long.Solution:

Use GPU inference: Verify the model is running on GPU:

from ultralytics import YOLO
model = YOLO("runs/drone_detect/weights/best.pt")
print(model.device)  # Should show 'cuda:0'

Enable half precision: Already enabled in detect.py:37 (half=True). Verify:

results = model.predict(
    half=True,  # FP16 inference (2x faster)
)

Increase batch size:

results = model.predict(
    batch=32,  # Increase from 16 to 32 (if VRAM allows)
)

Reduce image size:
```
IMGSZ = 640  # Reduce from 960 to 640
```
Faster but may reduce detection accuracy.

Memory Issues

System RAM exhausted during training

Problem: System runs out of RAM (not VRAM) during training.Solution:BeamFinder caches the entire dataset in RAM (cache="ram" in train.py:22). The 11,387 images total ~650MB on disk but expand to ~4GB in RAM.If you have less than 16GB system RAM, use disk caching instead:

model.train(
    cache="disk",  # Cache to disk instead of RAM
    # ... other args
)

Or disable caching entirely:

model.train(
    cache=False,  # No caching (slowest)
)

See the Known Issues page for memory requirements.

Disk cache fills up storage

Problem: cache="disk" creates large cache files that fill up disk.Solution:Ultralytics creates .cache files in the dataset directory:

rm data/images/train/*.cache
rm data/images/validation/*.cache
rm data/images/test/*.cache

Or switch to RAM caching if you have enough memory:

model.train(
    cache="ram",  # Needs ~4GB system RAM
)

General Tips

Check Ultralytics version

BeamFinder requires ultralytics>=8.4.0:

pip show ultralytics

Update if needed:

pip install --upgrade ultralytics

Enable verbose logging

Add verbose=True to training/detection for detailed logs:

model.train(verbose=True, ...)
model.predict(verbose=True, ...)

Monitor GPU during training

Watch GPU utilization and memory in real-time:

watch -n 1 nvidia-smi

Or use nvtop for a better interface:

sudo apt install nvtop
nvtop

Check YOLO documentation

Many issues are documented in the official Ultralytics docs:

If you’re still stuck, check the Known Issues page for documented limitations and workarounds.

Get Started

Guides

Reference

Resources

Installation Issues

GPU & CUDA Issues

Dataset Issues

Training Issues

Detection Issues

Memory Issues

General Tips

Build docs developers (and LLMs) love

Get Started

Guides

Reference

Resources

​Installation Issues

​GPU & CUDA Issues

​Dataset Issues

​Training Issues

​Detection Issues

​Memory Issues

​General Tips

Build docs developers (and LLMs) love

Installation Issues

GPU & CUDA Issues

Dataset Issues

Training Issues

Detection Issues

Memory Issues

General Tips