Skip to main content
The WhisperVadContext class provides methods for detecting speech segments in audio files and data using the Silero VAD model.

Properties

id
number
Unique context identifier
gpu
boolean
Whether GPU/Metal acceleration is active
reasonNoGPU
string
Explanation if GPU is not available (empty string if GPU is active)

Methods

detectSpeech

Detect speech segments in an audio file or base64-encoded WAV data.
detectSpeech(
  filePathOrBase64: string | number,
  options?: VadOptions
): Promise<VadSegment[]>

Parameters

filePathOrBase64
string | number
required
Audio source to analyze:
  • File path: '/path/to/audio.wav'
  • Base64 WAV: 'data:audio/wav;base64,...'
  • Asset: require('./assets/audio.wav')
Note: Remote URLs (http/https) are not supported. Download the file first.
options
VadOptions
Voice activity detection configuration options. See VadOptions for details.

Returns

segments
Promise<VadSegment[]>
Array of detected speech segments

Example

// Detect speech in a file
const segments = await vadContext.detectSpeech('/path/to/audio.wav')

console.log(`Found ${segments.length} speech segments`)
segments.forEach((segment, i) => {
  const duration = (segment.t1 - segment.t0) / 1000
  console.log(`Segment ${i + 1}: ${segment.t0}ms - ${segment.t1}ms (${duration.toFixed(2)}s)`)
})

// Detect speech with custom options
const segments = await vadContext.detectSpeech('/path/to/audio.wav', {
  threshold: 0.6,
  minSpeechDurationMs: 500,
  minSilenceDurationMs: 200,
})

// Detect speech in base64 WAV data
const base64Audio = 'data:audio/wav;base64,UklGRi...'
const segments = await vadContext.detectSpeech(base64Audio)

detectSpeechData

Detect speech segments in raw audio data (base64-encoded PCM or ArrayBuffer).
detectSpeechData(
  audioData: string | ArrayBuffer,
  options?: VadOptions
): Promise<VadSegment[]>

Parameters

audioData
string | ArrayBuffer
required
Raw audio data:
  • Base64-encoded float32 PCM data (mono, 16kHz)
  • ArrayBuffer containing 16-bit PCM data (mono, 16kHz)
When using ArrayBuffer, the data is transferred efficiently via JSI bindings.
options
VadOptions
Voice activity detection configuration options. See VadOptions for details.

Returns

segments
Promise<VadSegment[]>
Array of detected speech segments with t0 (start time) and t1 (end time) in milliseconds

Example

// Detect speech in ArrayBuffer (efficient JSI transfer)
const audioBuffer: ArrayBuffer = getAudioData() // 16-bit PCM, mono, 16kHz
const segments = await vadContext.detectSpeechData(audioBuffer, {
  threshold: 0.5,
  minSpeechDurationMs: 250,
})

// Detect speech in base64-encoded PCM data
const base64Pcm = 'AAECAw...' // float32 PCM base64
const segments = await vadContext.detectSpeechData(base64Pcm)

release

Release the VAD context and free its memory.
release(): Promise<void>

Example

// Always release the context when finished
await vadContext.release()

// Or use try-finally to ensure cleanup
const vadContext = await initWhisperVad({
  filePath: '/path/to/silero_vad.bin',
})

try {
  const segments = await vadContext.detectSpeech('/path/to/audio.wav')
  // Process segments...
} finally {
  await vadContext.release()
}

Complete Example

import { initWhisperVad } from 'whisper.rn'

async function detectSpeechInAudio() {
  // Initialize VAD context
  const vadContext = await initWhisperVad({
    filePath: require('./assets/silero_vad.bin'),
    useGpu: true,
  })

  console.log('GPU enabled:', vadContext.gpu)
  if (!vadContext.gpu) {
    console.log('Reason:', vadContext.reasonNoGPU)
  }

  try {
    // Detect speech segments
    const segments = await vadContext.detectSpeech(
      '/path/to/audio.wav',
      {
        threshold: 0.5,
        minSpeechDurationMs: 250,
        minSilenceDurationMs: 100,
        maxSpeechDurationS: 30,
        speechPadMs: 30,
      }
    )

    // Process results
    console.log(`Detected ${segments.length} speech segments:`)
    segments.forEach((segment, i) => {
      const start = segment.t0 / 1000
      const end = segment.t1 / 1000
      const duration = end - start
      console.log(`  ${i + 1}. ${start.toFixed(2)}s - ${end.toFixed(2)}s (${duration.toFixed(2)}s)`)
    })
  } finally {
    // Clean up
    await vadContext.release()
  }
}

Audio Format Requirements

  • Sample rate: 16kHz
  • Channels: Mono (1 channel)
  • Format: 16-bit PCM (for ArrayBuffer) or float32 PCM (for base64)
  • Supported inputs: WAV files, base64-encoded WAV/PCM, ArrayBuffer

Notes

  • ArrayBuffer input uses efficient JSI bindings for better performance
  • Remote URLs are not supported; download files before processing
  • Always release contexts when finished to avoid memory leaks
  • Use releaseAllWhisperVad() to release all VAD contexts at once

Build docs developers (and LLMs) love