Overview
TheMTCNNDetector class uses MTCNN (Multi-task Cascaded Convolutional Networks) for face detection. It provides the highest accuracy among available detectors, with configurable confidence thresholds.
Initialization
Minimum confidence score (0-1) for a detection to be considered valid.
- Higher values (0.95+): Fewer false positives, may miss some faces
- Lower values (0.7-0.8): More detections, higher false positive rate
- Default: 0.9 (90%) provides excellent balance
Attributes
MTCNN detector instance from the
mtcnn package.Minimum confidence threshold for valid detections (0-1 range).
Methods
detect()
Detects a face in the given frame.Input image in BGR format (OpenCV default).Will be automatically converted to RGB internally for MTCNN processing.
Bounding box as
(x, y, width, height) in pixels, or None if no face meets confidence threshold.Returns only the first (highest confidence) detection that exceeds min_confidence.x: X-coordinate of top-left cornery: Y-coordinate of top-left cornerwidth: Width of bounding boxheight: Height of bounding box
close()
Cleanup method (no actual cleanup needed for MTCNN).Usage Examples
Basic Detection
High Confidence Detection
Sensitive Detection
Video Processing
With FaceDetector Manager
Recommended usage through the unified manager:Performance Characteristics
Speed
- Average FPS: ~10 FPS (slowest detector)
- Detection Time: ~100ms per frame on typical hardware
- Real-time capable: Limited - best for offline processing
Accuracy
- Detection Quality: Excellent - highest accuracy
- False Positives: Very low with default 0.9 threshold
- Robustness: Handles various angles, lighting, and occlusions well
- Profile Faces: Better than Haar, comparable to YOLO
Resource Usage
- CPU Usage: High
- Memory: Higher than MediaPipe/Haar
- GPU: Can leverage GPU for faster processing
- Best For: Offline processing, maximum accuracy requirements
Implementation Details
Color Space Conversion
MTCNN requires RGB input, so BGR frames are converted:Confidence Filtering
Only detections meeting the confidence threshold are returned:Bounding Box Format
MTCNN returns{'box': [x, y, width, height], 'confidence': score}:
Single Face Detection
Only the first (highest confidence) detection is processed:Confidence Threshold Guide
| Threshold | False Positives | Missed Faces | Use Case |
|---|---|---|---|
| 0.6-0.7 | Higher | Fewer | Maximize detections |
| 0.8-0.85 | Moderate | Some | Balanced approach |
| 0.9 | Low | Occasional | Recommended default |
| 0.95+ | Very low | More | Critical applications |
Use Cases
Ideal For:
- Offline Video Processing: Accuracy over speed
- Face Recognition Preprocessing: High-quality face extraction
- Security Applications: Low false positive tolerance
- Challenging Conditions: Poor lighting, angles, partial occlusions
- Quality over Speed: When accuracy is paramount
Not Recommended For:
- Real-time Webcam: Too slow for smooth video
- Embedded Devices: Resource requirements too high
- High FPS Requirements: Cannot maintain 30+ FPS
- Battery-Powered Devices: High power consumption
Accuracy vs Performance Tradeoffs
Maximum Accuracy
Balanced
Maximum Recall
Comparison with Other Detectors
| Feature | MTCNN | YOLO | MediaPipe | Haar |
|---|---|---|---|---|
| Speed | Slow (10 FPS) | Moderate (15 FPS) | Fast (25 FPS) | Fastest (30+ FPS) |
| Accuracy | Excellent | Excellent | Very Good | Good |
| Profile Faces | Very Good | Very Good | Good | Poor |
| Lighting Robustness | Excellent | Excellent | Very Good | Fair |
| False Positives | Very Low | Very Low | Low | Moderate |
| Resource Usage | High | Moderate | Low | Very Low |
| Best For | Max accuracy | Balanced | Real-time | Embedded |
MTCNN Architecture
MTCNN uses a three-stage cascade:- P-Net (Proposal Network): Fast scanning for candidate regions
- R-Net (Refine Network): Refines candidates and rejects false positives
- O-Net (Output Network): Final detection with facial landmarks
Common Patterns
Batch Processing Videos
Quality Face Extraction
The default 0.9 confidence threshold provides excellent accuracy while maintaining reasonable detection rates. Adjust based on your specific accuracy vs recall requirements.