Overview
TheYOLODetector class uses Ultralytics YOLO implementation for face detection. It supports both predefined model presets (YOLOv8n, YOLOv12n) and custom model paths.
Initialization
Name of predefined YOLO model from
YOLO_MODELS config.Available presets:"yolov8n"- YOLOv8 nano model (faster)"yolov12n"- YOLOv12 nano model (more accurate)
preset or model_path must be provided.Path to custom YOLO model weights (.pt file).Use this to load your own trained YOLO face detection model.
Either
preset or model_path must be provided.Minimum confidence threshold (0-1) for valid detections.Detections below this confidence will be rejected.
Attributes
Ultralytics YOLO model instance loaded from preset or custom path.
Minimum confidence threshold for detections (0-1 range).
Methods
detect()
Detects a face in the given frame using YOLO.Input image in BGR or RGB format.
Bounding box as
(x, y, width, height) in pixels, or None if no valid detection meets confidence threshold.Returns the highest confidence detection that exceeds the threshold.x: X-coordinate of top-left cornery: Y-coordinate of top-left cornerwidth: Width of bounding boxheight: Height of bounding box
close()
Cleanup method (no actual cleanup needed for YOLO).Model Configuration
YOLO models are configured insrc/config.py:
Model Files
Model weights should be placed in:src/weights_models/yolov8n-face.pt- YOLOv8 nano face detectionsrc/weights_models/yolov12n-face.pt- YOLOv12 nano face detection
Usage Examples
Using YOLOv8 Preset
Using YOLOv12 for Higher Accuracy
Custom Model Path
With FaceDetector Manager
Recommended usage through the unified manager:Complete Example with Video
Benchmarking Example
Fromexperiments/advance_run_complete.py:
Performance Characteristics
Speed
- Average FPS: ~15 FPS (moderate speed)
- Detection Time: ~65ms per frame on typical hardware
- Real-time capable: Yes, but slower than MediaPipe/Haar
Accuracy
- Model Quality: Excellent detection accuracy
- False Positives: Very low with proper confidence threshold
- Robustness: Works well in challenging conditions
- YOLOv12 vs YOLOv8: v12 offers slightly better accuracy
Resource Usage
- CPU Usage: Moderate to high
- Memory: Higher than MediaPipe/Haar
- GPU: Can leverage GPU if available (faster)
Implementation Details
Highest Confidence Selection
When multiple faces are detected, only the highest confidence one is returned:Coordinate Format Conversion
YOLO returns XYXY format (top-left and bottom-right corners), which is converted to XYWH:Verbose Mode Disabled
Detection runs in silent mode to avoid console spam:Error Handling
Invalid Preset
Missing Parameters
Model Selection Guide
Use YOLOv8n when:
- You need good accuracy with moderate speed
- Processing pre-recorded videos
- GPU acceleration available
- Lower latency preferred over maximum accuracy
Use YOLOv12n when:
- Maximum accuracy is critical
- Processing challenging lighting/angles
- Can accept slightly slower processing
- Need lowest false positive rate
Comparison with Other Detectors
| Feature | YOLO | MediaPipe | Haar | MTCNN |
|---|---|---|---|---|
| Speed | Moderate (~15 FPS) | Fast (~25 FPS) | Fastest (~30+ FPS) | Slow (~10 FPS) |
| Accuracy | Excellent | Very Good | Good | Excellent |
| GPU Support | Yes | Limited | No | Limited |
| Model Size | Medium | Small | Tiny | Large |
| Best For | High accuracy | Balanced | Low-power | Max accuracy |
The detector automatically handles YOLO’s XYXY coordinate format and converts it to the standard XYWH format for consistency with other detectors.