Overview
Face detection is the critical first stage of the vital signs monitoring pipeline. The system identifies the facial region in each video frame and extracts a stable Region of Interest (ROI) that serves as input to the EVM processing chain.The system supports four different detection backends (Haar Cascade, MTCNN, YOLO, MediaPipe) with unified interface and built-in temporal stabilization to reduce jitter.
FaceDetector Architecture
TheFaceDetector class provides a unified interface to multiple detection models with integrated stabilization:
src/face_detector/manager.py:13-41
Detection Backends
Haar Cascade
Type: Classical computer visionPros:
- Extremely fast (CPU-only)
- No dependencies on deep learning frameworks
- Works well for frontal faces
- Lower accuracy on rotated faces
- Sensitive to lighting conditions
- More false positives
MTCNN
Type: Multi-stage CNNPros:
- High accuracy
- Handles rotation and scale variations
- Includes facial landmark detection
- Slower than Haar
- Requires more CPU/GPU resources
YOLO
Type: Single-shot detectorPros:
- Excellent speed/accuracy trade-off
- Robust to occlusions
- Supports YOLOv8 and YOLOv12 models
- Requires model weights file
- Higher memory footprint
MediaPipe
Type: Google’s ML solutionPros:
- Optimized for real-time performance
- Cross-platform support
- Built-in face mesh capability
- Requires MediaPipe framework
- Less customizable
Model Configuration
YOLO models require pre-trained weights specified in configuration:src/config.py:25-29
ROI Stabilization
Raw face detection results often exhibit frame-to-frame jitter due to detection uncertainty, head motion, and algorithmic noise. The stabilization system addresses this through temporal smoothing.Stabilization Buffer
src/face_detector/manager.py:43-55
The system maintains a sliding window of the 5 most recent detections using a deque for efficient O(1) append and pop operations.
Weighted Averaging
The stabilization algorithm applies weighted averaging with exponentially increasing weights for recent frames:src/face_detector/manager.py:57-93
Stabilization Weights
The weighting scheme prioritizes recent detections while still considering historical context:src/config.py:33
Weight Selection Rationale
Weight Selection Rationale
The weights
[0.1, 0.15, 0.2, 0.25, 0.3] provide:- Responsiveness: 30% weight on newest frame allows tracking of head movement
- Stability: 70% weight on historical frames smooths out detection jitter
- Linear increase: Simple pattern balances past and present
- Exponential weights:
[0.05, 0.10, 0.15, 0.25, 0.45](more responsive, less stable) - Uniform weights:
[0.2, 0.2, 0.2, 0.2, 0.2](more stable, less responsive) - The current linear scheme provides the best balance for vital signs monitoring
Change Detection Threshold
To avoid unnecessary updates for minimal changes, the system includes a significance threshold:src/face_detector/manager.py:132-158
The threshold is configured to ignore small fluctuations:
src/config.py:32
A 20-pixel threshold means that detection variations smaller than 20 pixels in any dimension (x, y, width, height) are ignored, preventing micro-jitter while still tracking genuine head movements.
Face Detection Pipeline
The complete detection process with stabilization:src/face_detector/manager.py:95-130
Significance Check
Compare new detection with previous stable ROI. If change is below threshold, return previous ROI without updating.
Weighted Stabilization
If change is significant, add to history buffer and compute weighted average.
ROI Extraction and Sizing
Once the face is detected, the ROI is extracted and optionally resized for processing:src/config.py:17-23
Why 320x240 Resolution?
Why 320x240 Resolution?
Rationale for downsampling:
- Computational efficiency: Lower resolution → faster pyramid construction and filtering
- Memory footprint: 320×240 = 76,800 pixels vs. 1920×1080 = 2,073,600 pixels (27× reduction)
- Signal quality: EVM operates on spatial averages; high resolution provides diminishing returns
- Physiological frequency preservation: Temporal resolution (FPS) matters more than spatial resolution
- Loss of fine spatial detail (acceptable for vital signs)
- Faster processing (critical for real-time performance)
- Reduced noise (larger effective averaging area)
Detector Base Interface
All detector backends implement theBaseFaceDetector abstract class:
src/face_detector/base.py:1-34
This abstraction allows seamless swapping of detection backends without changing downstream code.
Performance Characteristics
Detection Speed (Approximate)
| Backend | FPS (CPU) | FPS (GPU) | Latency | Accuracy |
|---|---|---|---|---|
| Haar Cascade | 60-100 | N/A | ~10ms | Medium |
| MTCNN | 10-20 | 30-50 | ~50ms | High |
| YOLO (v8n) | 30-45 | 100-200 | ~20ms | High |
| MediaPipe | 40-60 | 80-120 | ~15ms | High |
Performance varies significantly based on hardware. Values shown are for 640×480 input on typical hardware (Intel i5, NVIDIA GTX 1060).
Stabilization Impact
Without Stabilization:- ROI jitter: ±5-20 pixels per frame
- Signal noise: High-frequency artifacts in temporal signal
- Detection failures: Occasional frame drops create signal discontinuities
- ROI jitter: ±1-3 pixels per frame (80-90% reduction)
- Signal noise: Smoother temporal signals improve FFT quality
- Detection failures: Historical buffer maintains ROI through brief failures
Usage Example
Basic Usage
Advanced: Multiple Backends
Integration with EVM
Troubleshooting
No Face Detected
No Face Detected
Possible causes:
- Poor lighting conditions
- Face too small or too large in frame
- Extreme head pose (profile view)
- Occlusions (glasses, mask, hair)
- Ensure adequate, even lighting
- Position camera 0.5-2 meters from subject
- Use frontal or near-frontal poses
- Try different detector backend (MTCNN or YOLO for challenging cases)
Jittery ROI
Jittery ROI
Possible causes:
- Detector confidence fluctuations
- Subject movement
- Insufficient stabilization history
- Wait for 3-5 frames to build stabilization buffer
- Increase
ROI_WEIGHTSfor older frames - Reduce
ROI_CHANGE_THRESHOLDfor more aggressive stabilization
Multiple Faces Detected
Multiple Faces Detected
Behavior: Most detectors return only the largest/most confident faceIf this causes issues:
- Modify detector to track specific person (custom logic needed)
- Ensure only one person in frame
- Use detector with face recognition capability (e.g., MTCNN with embedding comparison)
Related Concepts
Eulerian Video Magnification
Learn how ROI frames are processed for vital signs
Signal Processing
Understand how stable ROI improves signal quality
System Overview
See face detection in the full pipeline
API Reference
Explore FaceDetector API documentation