Overview
Voice Activity Detection (VAD) identifies segments of audio that contain speech, filtering out silence and non-speech sounds. This is useful for:- Preprocessing audio before transcription
- Reducing processing time by skipping silent segments
- Improving accuracy by focusing on speech portions
- Building voice-triggered applications
Installation
VAD will be included in the main package:Basic Usage
API Reference
initializeVAD()
Initialize the Voice Activity Detection model.Parameters
Configuration options for VAD initialization
Returns
Promise that resolves when VAD is initialized.Example
detectVoiceActivity()
Detect voice activity segments in an audio file.Parameters
Path to the audio file to analyze
Returns
Promise that resolves to an array of voice segments.Example
unloadVAD()
Release VAD model resources.Returns
Promise that resolves when resources are released.Example
Types
VADInitializeOptions
VoiceSegment
ModelPathConfig
Best Practices
Choose appropriate VAD models
Choose appropriate VAD models
Different VAD models have different characteristics:
- Silero VAD: Fast, lightweight, good for real-time applications
- WebRTC VAD: Classic algorithm, very fast but less accurate
- Deep learning models: More accurate but slower
Adjust sensitivity for your use case
Adjust sensitivity for your use case
VAD sensitivity affects the trade-off between:
- High sensitivity: Catches more speech but may include noise
- Low sensitivity: More conservative, may miss quiet speech
Preprocess audio for better results
Preprocess audio for better results
VAD works best with:
- Clean audio without heavy background noise
- Consistent volume levels
- Appropriate sample rates (typically 16kHz)
Error Handling
Related
Speech Enhancement
Improve audio quality before VAD
Speech Recognition
Transcribe detected speech segments