Voice Activity Detection (VAD) API is planned for a future release.
Overview
Voice Activity Detection (VAD) is a technique for detecting the presence or absence of human speech in an audio signal. It’s commonly used to:- Reduce computational load by processing only speech segments
- Improve speech recognition accuracy by filtering out silence and noise
- Enable push-to-talk and voice-triggered applications
- Optimize audio streaming and bandwidth
Planned Features
The VAD API will provide:- Real-time voice detection: Detect speech in live audio streams
- Batch processing: Analyze audio files for speech segments
- Configurable sensitivity: Adjust detection thresholds
- Multiple VAD models: Support for different VAD architectures
- Integration with STT: Seamless integration with speech recognition
Expected Usage (Preview)
Model Support
Planned support for popular VAD models:- Silero VAD: Lightweight and accurate VAD model
- WebRTC VAD: Fast, low-latency detection
- Custom models: Bring your own ONNX VAD models
Integration Example
Combining VAD with streaming STT:Availability
This API is not yet implemented. Track progress on the react-native-sherpa-onnx GitHub repository.See Also
- Streaming STT - Real-time speech recognition
- Audio Utilities - Audio capture and processing
- Diarization API - Speaker identification (planned)