Skip to main content
Coming Soon - This feature is planned but not yet implemented. The API interface is subject to change.

Overview

Voice Activity Detection (VAD) identifies segments of audio that contain speech, filtering out silence and non-speech sounds. This is useful for:
  • Preprocessing audio before transcription
  • Reducing processing time by skipping silent segments
  • Improving accuracy by focusing on speech portions
  • Building voice-triggered applications

Installation

VAD will be included in the main package:
npm install react-native-sherpa-onnx

Basic Usage

import { initializeVAD, detectVoiceActivity } from 'react-native-sherpa-onnx/vad';

// Initialize VAD with model
await initializeVAD({
  modelPath: {
    type: 'auto',
    path: 'models/vad-model'
  }
});

// Detect voice segments
const segments = await detectVoiceActivity('path/to/audio.wav');

console.log('Voice segments:', segments);
// [{ start: 0.5, end: 3.2 }, { start: 4.1, end: 7.8 }]

API Reference

initializeVAD()

Initialize the Voice Activity Detection model.
await initializeVAD(options: VADInitializeOptions): Promise<void>

Parameters

options
VADInitializeOptions
required
Configuration options for VAD initialization

Returns

Promise that resolves when VAD is initialized.

Example

await initializeVAD({
  modelPath: {
    type: 'auto',
    path: 'models/silero-vad'
  }
});

detectVoiceActivity()

Detect voice activity segments in an audio file.
await detectVoiceActivity(filePath: string): Promise<VoiceSegment[]>

Parameters

filePath
string
required
Path to the audio file to analyze

Returns

Promise that resolves to an array of voice segments.
VoiceSegment[]
array

Example

const segments = await detectVoiceActivity('/path/to/recording.wav');

segments.forEach(segment => {
  console.log(`Speech from ${segment.start}s to ${segment.end}s`);
});

unloadVAD()

Release VAD model resources.
await unloadVAD(): Promise<void>

Returns

Promise that resolves when resources are released.

Example

// When done with VAD
await unloadVAD();

Types

VADInitializeOptions

interface VADInitializeOptions {
  modelPath: ModelPathConfig;
  // Additional options will be added in future versions
}

VoiceSegment

interface VoiceSegment {
  start: number;  // Start time in seconds
  end: number;    // End time in seconds
  // Additional fields will be added in future versions
}

ModelPathConfig

interface ModelPathConfig {
  type: 'auto' | 'file';
  path: string;
}

Best Practices

Different VAD models have different characteristics:
  • Silero VAD: Fast, lightweight, good for real-time applications
  • WebRTC VAD: Classic algorithm, very fast but less accurate
  • Deep learning models: More accurate but slower
Choose based on your accuracy and performance requirements.
VAD sensitivity affects the trade-off between:
  • High sensitivity: Catches more speech but may include noise
  • Low sensitivity: More conservative, may miss quiet speech
Tune based on your audio quality and application needs.
VAD works best with:
  • Clean audio without heavy background noise
  • Consistent volume levels
  • Appropriate sample rates (typically 16kHz)
Consider using speech enhancement before VAD if needed.

Error Handling

try {
  await initializeVAD({
    modelPath: {
      type: 'auto',
      path: 'models/vad'
    }
  });
  
  const segments = await detectVoiceActivity('audio.wav');
  
  if (segments.length === 0) {
    console.log('No speech detected in audio');
  }
} catch (error) {
  console.error('VAD error:', error);
} finally {
  await unloadVAD();
}

Speech Enhancement

Improve audio quality before VAD

Speech Recognition

Transcribe detected speech segments

Build docs developers (and LLMs) love