Voice Activity Detection

Coming Soon - This feature is planned but not yet implemented. The API interface is subject to change.

Overview

Voice Activity Detection (VAD) identifies segments of audio that contain speech, filtering out silence and non-speech sounds. This is useful for:

Preprocessing audio before transcription
Reducing processing time by skipping silent segments
Improving accuracy by focusing on speech portions
Building voice-triggered applications

Installation

VAD will be included in the main package:

npm install react-native-sherpa-onnx

Basic Usage

import { initializeVAD, detectVoiceActivity } from 'react-native-sherpa-onnx/vad';

// Initialize VAD with model
await initializeVAD({
  modelPath: {
    type: 'auto',
    path: 'models/vad-model'
  }
});

// Detect voice segments
const segments = await detectVoiceActivity('path/to/audio.wav');

console.log('Voice segments:', segments);
// [{ start: 0.5, end: 3.2 }, { start: 4.1, end: 7.8 }]

API Reference

initializeVAD()

Initialize the Voice Activity Detection model.

await initializeVAD(options: VADInitializeOptions): Promise<void>

Parameters

options

VADInitializeOptions

required

Configuration options for VAD initialization

Show properties

modelPath

ModelPathConfig

required

Path configuration for the VAD model

Show properties

type

'auto' | 'file'

required

Type of model path resolution

path

string

required

Path to the model directory

Returns

Promise that resolves when VAD is initialized.

Example

await initializeVAD({
  modelPath: {
    type: 'auto',
    path: 'models/silero-vad'
  }
});

detectVoiceActivity()

Detect voice activity segments in an audio file.

await detectVoiceActivity(filePath: string): Promise<VoiceSegment[]>

Parameters

filePath

string

required

Path to the audio file to analyze

Returns

Promise that resolves to an array of voice segments.

VoiceSegment[]

array

Show VoiceSegment properties

start

number

required

Start time of the voice segment in seconds

end

number

required

End time of the voice segment in seconds

Example

const segments = await detectVoiceActivity('/path/to/recording.wav');

segments.forEach(segment => {
  console.log(`Speech from ${segment.start}s to ${segment.end}s`);
});

unloadVAD()

Release VAD model resources.

await unloadVAD(): Promise<void>

Returns

Promise that resolves when resources are released.

Example

// When done with VAD
await unloadVAD();

Types

VADInitializeOptions

interface VADInitializeOptions {
  modelPath: ModelPathConfig;
  // Additional options will be added in future versions
}

VoiceSegment

interface VoiceSegment {
  start: number;  // Start time in seconds
  end: number;    // End time in seconds
  // Additional fields will be added in future versions
}

ModelPathConfig

interface ModelPathConfig {
  type: 'auto' | 'file';
  path: string;
}

Best Practices

Choose appropriate VAD models

Different VAD models have different characteristics:

Silero VAD: Fast, lightweight, good for real-time applications
WebRTC VAD: Classic algorithm, very fast but less accurate
Deep learning models: More accurate but slower

Choose based on your accuracy and performance requirements.

Adjust sensitivity for your use case

VAD sensitivity affects the trade-off between:

High sensitivity: Catches more speech but may include noise
Low sensitivity: More conservative, may miss quiet speech

Tune based on your audio quality and application needs.

Preprocess audio for better results

VAD works best with:

Clean audio without heavy background noise
Consistent volume levels
Appropriate sample rates (typically 16kHz)

Consider using speech enhancement before VAD if needed.

Error Handling

try {
  await initializeVAD({
    modelPath: {
      type: 'auto',
      path: 'models/vad'
    }
  });
  
  const segments = await detectVoiceActivity('audio.wav');
  
  if (segments.length === 0) {
    console.log('No speech detected in audio');
  }
} catch (error) {
  console.error('VAD error:', error);
} finally {
  await unloadVAD();
}

Speech Enhancement

Improve audio quality before VAD

Speech Recognition

Transcribe detected speech segments

Core API

Speech-to-Text

Text-to-Speech

Audio Processing

Utilities

Overview

Installation

Basic Usage

API Reference

initializeVAD()

Parameters

Returns

Example

detectVoiceActivity()

Parameters

Returns

Example

unloadVAD()

Returns

Example

Types

VADInitializeOptions

VoiceSegment

ModelPathConfig

Best Practices

Error Handling

Speech Enhancement

Speech Recognition

Build docs developers (and LLMs) love

Core API

Speech-to-Text

Text-to-Speech

Audio Processing

Utilities

​Overview

​Installation

​Basic Usage

​API Reference

​initializeVAD()

​Parameters

​Returns

​Example

​detectVoiceActivity()

​Parameters

​Returns

​Example

​unloadVAD()

​Returns

​Example

​Types

​VADInitializeOptions

​VoiceSegment

​ModelPathConfig

​Best Practices

​Error Handling

​Related

Speech Enhancement

Speech Recognition

Build docs developers (and LLMs) love

Overview

Installation

Basic Usage

API Reference

initializeVAD()

Parameters

Returns

Example

detectVoiceActivity()

Parameters

Returns

Example

unloadVAD()

Returns

Example

Types

VADInitializeOptions

VoiceSegment

ModelPathConfig

Best Practices

Error Handling

Related