Installation Issues
ImportError: No module named 'ultralytics'
ImportError: No module named 'ultralytics'
ultralytics>=8.4.0. If you’re using a virtual environment, make sure it’s activated before installing.Verify installation:ModuleNotFoundError: No module named 'torch'
ModuleNotFoundError: No module named 'torch'
requirements.txt assumes PyTorch is already installed (this is the case on Lightning.ai A100 instances). Install PyTorch manually:GPU & CUDA Issues
RuntimeError: CUDA out of memory
RuntimeError: CUDA out of memory
CUDA out of memory error.Solution:BeamFinder uses batch=0.90 (90% GPU memory utilization) by default in train.py:20. If you’re running other processes on the GPU, reduce this:batch=0.85 with imgsz=960 and amp=True, resulting in batch size 2-4.CUDA not available / training on CPU
CUDA not available / training on CPU
-
Verify NVIDIA driver installation:
Should show GPU info and CUDA version.
-
Reinstall PyTorch with CUDA support:
-
Check CUDA toolkit version compatibility:
PyTorch CUDA version must match your NVIDIA driver’s supported CUDA version (shown in
nvidia-smi).
torch.cuda.OutOfMemoryError during inference
torch.cuda.OutOfMemoryError during inference
detect.py) crashes with out-of-memory error.Solution:Reduce batch size in detect.py:37:batch=16 and half=True, inference needs ~2GB VRAM at imgsz=960.Dataset Issues
FileNotFoundError: data/images/train not found
FileNotFoundError: data/images/train not found
data.yaml:1-4):.txt file in labels/ must have the same name as its corresponding image in images/ (e.g., IMG_001.jpg → IMG_001.txt).ValueError: No labels found in dataset
ValueError: No labels found in dataset
-
Check label format: Each
.txtfile should contain bounding boxes in YOLO format:Format:class_id x_center y_center width height(normalized 0-1) -
Verify label files exist:
-
Check data.yaml path:
The
path: dataindata.yaml:1is relative to the script location. If you moveddata.yaml, update the path to an absolute path:
WARNING: Image size mismatch
WARNING: Image size mismatch
imgsz=960 with rect=True to preserve aspect ratio.If you want to eliminate warnings entirely, ensure all images are exactly 960×540:rect=True parameter in train.py:30 avoids wasting 44% of pixels on black padding for 16:9 images.Training Issues
RuntimeError: Dataloader workers multiprocessing error (Windows)
RuntimeError: Dataloader workers multiprocessing error (Windows)
workers=0 in train.py:23:if __name__ == "__main__": (already present in train.py:8).See the Known Issues page for details on Windows multiprocessing.Training is very slow
Training is very slow
-
GPU is being used:
GPU utilization should be 80-100% during training.
-
Enable dataset caching:
-
Use mixed precision (FP16):
BeamFinder uses
amp=Trueby default, but verify: -
Enable torch.compile (PyTorch 2.x + CUDA):
Already enabled in
train.py:26for A100 training. -
Increase worker count (Linux/Mac only):
OSError: [Errno 28] No space left on device
OSError: [Errno 28] No space left on device
save_period=10 in train.py:31). These accumulate in runs/drone_detect/weights/:mAP is very low (< 0.5)
mAP is very low (< 0.5)
-
Labels are incorrect or missing:
Verify a few annotation files manually:
-
Class ID mismatch:
BeamFinder uses a single class (
nc: 1indata.yaml:6). All labels should haveclass_id = 0: -
Training hasn’t converged:
100 epochs with
patience=20should be sufficient, but you can train longer: -
Model is too small:
Try a larger YOLO variant (
yolo26m.ptoryolo26l.ptinstead ofyolo26s.pt).
Detection Issues
FileNotFoundError: runs/drone_detect/weights/best.pt not found
FileNotFoundError: runs/drone_detect/weights/best.pt not found
runs/drone_detect/weights/best.pt. Detection script loads this model at detect.py:9:MODEL path in detect.py.No detections in output CSV
No detections in output CSV
output/detections.csv only contains headers, no bounding boxes.Solution:-
Check confidence threshold:
BeamFinder uses
CONF=0.4indetect.py:12. Lower it to detect less confident predictions: -
Verify test images exist:
-
Check model performance:
Run validation to see if the model learned anything:
-
Verify image directory:
Make sure
IMAGE_DIRindetect.py:10points to the correct folder:
Detection is very slow
Detection is very slow
-
Use GPU inference:
Verify the model is running on GPU:
-
Enable half precision:
Already enabled in
detect.py:37(half=True). Verify: -
Increase batch size:
-
Reduce image size:
Faster but may reduce detection accuracy.
Memory Issues
System RAM exhausted during training
System RAM exhausted during training
cache="ram" in train.py:22). The 11,387 images total ~650MB on disk but expand to ~4GB in RAM.If you have less than 16GB system RAM, use disk caching instead:Disk cache fills up storage
Disk cache fills up storage
cache="disk" creates large cache files that fill up disk.Solution:Ultralytics creates .cache files in the dataset directory:General Tips
Monitor GPU during training
Check YOLO documentation