Skip to main content

Installation Issues

Problem: Python cannot find the Ultralytics package after installation.Solution:
pip install -r requirements.txt
BeamFinder requires ultralytics>=8.4.0. If you’re using a virtual environment, make sure it’s activated before installing.Verify installation:
python -c "import ultralytics; print(ultralytics.__version__)"
Problem: PyTorch is not installed.Solution:The requirements.txt assumes PyTorch is already installed (this is the case on Lightning.ai A100 instances). Install PyTorch manually:
# CPU only
pip install torch torchvision

# CUDA 11.8
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

# CUDA 12.1
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
See PyTorch installation guide for your specific setup.

GPU & CUDA Issues

Problem: Training crashes with CUDA out of memory error.Solution:BeamFinder uses batch=0.90 (90% GPU memory utilization) by default in train.py:20. If you’re running other processes on the GPU, reduce this:
model.train(
    batch=0.70,  # Use 70% instead of 90%
    # ... other args
)
Or disable automatic mixed precision (uses more VRAM but might help with fragmentation):
model.train(
    amp=False,  # Disable FP16
    # ... other args
)
The RTX 3050 (4GB VRAM) can handle batch=0.85 with imgsz=960 and amp=True, resulting in batch size 2-4.
Problem: Training is extremely slow because PyTorch is using CPU instead of GPU.Check if CUDA is available:
python -c "import torch; print(torch.cuda.is_available())"
python -c "import torch; print(torch.version.cuda)"
Solution:
  1. Verify NVIDIA driver installation:
    nvidia-smi
    
    Should show GPU info and CUDA version.
  2. Reinstall PyTorch with CUDA support:
    pip uninstall torch torchvision
    pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
    
  3. Check CUDA toolkit version compatibility: PyTorch CUDA version must match your NVIDIA driver’s supported CUDA version (shown in nvidia-smi).
Problem: Detection (detect.py) crashes with out-of-memory error.Solution:Reduce batch size in detect.py:37:
results = model.predict(
    source=str(IMAGE_DIR),
    batch=4,  # Reduce from 16 to 4 or 1
    # ... other args
)
The test set has 1,709 images. With batch=16 and half=True, inference needs ~2GB VRAM at imgsz=960.

Dataset Issues

Problem: Training fails because the dataset directory structure is incorrect.Solution:BeamFinder expects this exact directory structure (defined in data.yaml:1-4):
BeamFinder/
└── data/
    ├── images/
    │   ├── train/          # 7,970 images
    │   ├── validation/     # 1,708 images
    │   └── test/           # 1,709 images
    └── labels/
        ├── train/          # 7,970 .txt files
        ├── validation/     # 1,708 .txt files
        └── test/           # 1,709 .txt files
Each .txt file in labels/ must have the same name as its corresponding image in images/ (e.g., IMG_001.jpgIMG_001.txt).
Problem: YOLO can’t find annotation files during training.Solution:
  1. Check label format: Each .txt file should contain bounding boxes in YOLO format:
    0 0.5 0.5 0.3 0.2
    
    Format: class_id x_center y_center width height (normalized 0-1)
  2. Verify label files exist:
    ls data/labels/train/ | wc -l  # Should show 7,970
    
  3. Check data.yaml path: The path: data in data.yaml:1 is relative to the script location. If you moved data.yaml, update the path to an absolute path:
    path: /absolute/path/to/BeamFinder/data
    
Problem: YOLO reports image size warnings during training.Solution:This is usually harmless. BeamFinder images are 960×540 (16:9 aspect ratio). YOLO resizes them to imgsz=960 with rect=True to preserve aspect ratio.If you want to eliminate warnings entirely, ensure all images are exactly 960×540:
# Check image sizes
python -c "from PIL import Image; import sys; img = Image.open(sys.argv[1]); print(img.size)" data/images/train/IMG_001.jpg
The rect=True parameter in train.py:30 avoids wasting 44% of pixels on black padding for 16:9 images.

Training Issues

Problem: Training crashes on Windows with multiprocessing errors.Solution:This is a known Windows limitation. Set workers=0 in train.py:23:
model.train(
    workers=0,  # Required on Windows
    # ... other args
)
Also ensure your training script uses if __name__ == "__main__": (already present in train.py:8).
Training will be slower with workers=0 (single-threaded data loading). Use cache="ram" to compensate.
See the Known Issues page for details on Windows multiprocessing.
Problem: Training takes much longer than expected.Checklist:
  1. GPU is being used:
    watch -n 1 nvidia-smi  # Monitor GPU utilization during training
    
    GPU utilization should be 80-100% during training.
  2. Enable dataset caching:
    model.train(
        cache="ram",  # Cache in RAM (needs ~4GB system memory)
        # Or:
        cache="disk",  # Cache to disk (slower but less RAM)
    )
    
  3. Use mixed precision (FP16): BeamFinder uses amp=True by default, but verify:
    model.train(
        amp=True,  # Automatic Mixed Precision
    )
    
  4. Enable torch.compile (PyTorch 2.x + CUDA):
    model.train(
        compile=True,  # 10-30% faster on A100
    )
    
    Already enabled in train.py:26 for A100 training.
  5. Increase worker count (Linux/Mac only):
    model.train(
        workers=8,  # Parallel data loading
    )
    
Problem: Disk runs out of space during training.Solution:Training generates checkpoints every 10 epochs (save_period=10 in train.py:31). These accumulate in runs/drone_detect/weights/:
# Remove intermediate checkpoints, keep only best.pt and last.pt
rm runs/drone_detect/weights/epoch*.pt
Or reduce checkpoint frequency:
model.train(
    save_period=25,  # Save every 25 epochs instead of 10
)
Or disable intermediate checkpoints entirely:
model.train(
    save_period=-1,  # Only save best.pt and last.pt
)
Problem: Model accuracy on validation set is unexpectedly low.Possible causes:
  1. Labels are incorrect or missing: Verify a few annotation files manually:
    cat data/labels/train/IMG_001.txt
    
  2. Class ID mismatch: BeamFinder uses a single class (nc: 1 in data.yaml:6). All labels should have class_id = 0:
    0 0.5 0.5 0.3 0.2
    
  3. Training hasn’t converged: 100 epochs with patience=20 should be sufficient, but you can train longer:
    model.train(
        epochs=200,
        patience=40,
    )
    
  4. Model is too small: Try a larger YOLO variant (yolo26m.pt or yolo26l.pt instead of yolo26s.pt).

Detection Issues

Problem: Detection script can’t find the trained model weights.Solution:You need to train the model first:
python train.py
This creates runs/drone_detect/weights/best.pt. Detection script loads this model at detect.py:9:
MODEL = str(SCRIPT_DIR / "runs" / "drone_detect" / "weights" / "best.pt")
If you moved the weights file, update the MODEL path in detect.py.
Problem: output/detections.csv only contains headers, no bounding boxes.Solution:
  1. Check confidence threshold: BeamFinder uses CONF=0.4 in detect.py:12. Lower it to detect less confident predictions:
    CONF = 0.25  # Reduce from 0.4 to 0.25
    
  2. Verify test images exist:
    ls data/images/test/ | wc -l  # Should show 1,709
    
  3. Check model performance: Run validation to see if the model learned anything:
    python -c "from ultralytics import YOLO; m = YOLO('runs/drone_detect/weights/best.pt'); print(m.val(data='data.yaml', split='test'))"
    
  4. Verify image directory: Make sure IMAGE_DIR in detect.py:10 points to the correct folder:
    IMAGE_DIR = SCRIPT_DIR / "data" / "images" / "test"
    
Problem: Processing 1,709 test images takes too long.Solution:
  1. Use GPU inference: Verify the model is running on GPU:
    from ultralytics import YOLO
    model = YOLO("runs/drone_detect/weights/best.pt")
    print(model.device)  # Should show 'cuda:0'
    
  2. Enable half precision: Already enabled in detect.py:37 (half=True). Verify:
    results = model.predict(
        half=True,  # FP16 inference (2x faster)
    )
    
  3. Increase batch size:
    results = model.predict(
        batch=32,  # Increase from 16 to 32 (if VRAM allows)
    )
    
  4. Reduce image size:
    IMGSZ = 640  # Reduce from 960 to 640
    
    Faster but may reduce detection accuracy.

Memory Issues

Problem: System runs out of RAM (not VRAM) during training.Solution:BeamFinder caches the entire dataset in RAM (cache="ram" in train.py:22). The 11,387 images total ~650MB on disk but expand to ~4GB in RAM.If you have less than 16GB system RAM, use disk caching instead:
model.train(
    cache="disk",  # Cache to disk instead of RAM
    # ... other args
)
Or disable caching entirely:
model.train(
    cache=False,  # No caching (slowest)
)
See the Known Issues page for memory requirements.
Problem: cache="disk" creates large cache files that fill up disk.Solution:Ultralytics creates .cache files in the dataset directory:
rm data/images/train/*.cache
rm data/images/validation/*.cache
rm data/images/test/*.cache
Or switch to RAM caching if you have enough memory:
model.train(
    cache="ram",  # Needs ~4GB system RAM
)

General Tips

1

Check Ultralytics version

BeamFinder requires ultralytics>=8.4.0:
pip show ultralytics
Update if needed:
pip install --upgrade ultralytics
2

Enable verbose logging

Add verbose=True to training/detection for detailed logs:
model.train(verbose=True, ...)
model.predict(verbose=True, ...)
3

Monitor GPU during training

Watch GPU utilization and memory in real-time:
watch -n 1 nvidia-smi
Or use nvtop for a better interface:
sudo apt install nvtop
nvtop
4

Check YOLO documentation

Many issues are documented in the official Ultralytics docs:
If you’re still stuck, check the Known Issues page for documented limitations and workarounds.

Build docs developers (and LLMs) love