Skip to main content

Overview

The WhisperContext class provides methods for transcribing audio files and data. It is created by calling initWhisper().

Properties

id
number
Unique identifier for this context
ptr
number
Native context pointer (for internal use)
gpu
boolean
Whether GPU/Metal acceleration is active for this context
reasonNoGPU
string
Explanation if GPU is not available (empty string if GPU is active)

Methods

transcribe()

Transcribe an audio file or base64-encoded WAV data.
transcribe(
  filePathOrBase64: string | number,
  options?: TranscribeFileOptions
): { stop: () => Promise<void>, promise: Promise<TranscribeResult> }

Parameters

filePathOrBase64
string | number
required
Audio file path, base64 WAV data, or asset require.Supported formats:
  • File path: '/path/to/audio.wav'
  • Asset: require('../assets/sample.wav')
  • Base64 WAV: 'data:audio/wav;base64,...'
  • file:// URI (automatically normalized)
Remote URLs are not supported. Download the file first.
options
TranscribeFileOptions
Transcription options.

Returns

stop
() => Promise<void>
Function to abort the transcription
promise
Promise<TranscribeResult>
Promise that resolves with the transcription result

Example

const { stop, promise } = whisperContext.transcribe(
  require('../assets/sample.wav'),
  {
    language: 'en',
    maxLen: 1,
    tokenTimestamps: true,
    onProgress: (progress) => {
      console.log(`Progress: ${progress}%`)
    },
  }
)

const { result, segments } = await promise
console.log('Transcription:', result)

segments.forEach((segment) => {
  console.log(`[${segment.t0}ms -> ${segment.t1}ms] ${segment.text}`)
})

transcribeData()

Transcribe raw audio data from base64-encoded float32 PCM or ArrayBuffer.
transcribeData(
  data: string | ArrayBuffer,
  options?: TranscribeFileOptions
): { stop: () => Promise<void>, promise: Promise<TranscribeResult> }

Parameters

data
string | ArrayBuffer
required
Raw audio data. Must be 16-bit PCM, mono, 16kHz.
  • string: Base64-encoded float32 PCM data
  • ArrayBuffer: Raw 16-bit PCM data (uses JSI for efficient transfer)
options
TranscribeFileOptions
Same options as transcribe()

Returns

Same return type as transcribe().

Example

// Using ArrayBuffer (recommended for performance)
const audioBuffer = new Int16Array([...]).buffer
const { promise } = whisperContext.transcribeData(audioBuffer, {
  language: 'en',
})
const { result } = await promise

// Using base64 PCM data
const base64Data = 'base64EncodedFloat32PCM...'
const { promise: promise2 } = whisperContext.transcribeData(base64Data)
const { result: result2 } = await promise2
ArrayBuffer transcription uses JSI bindings for efficient memory transfer, avoiding JSON serialization overhead.

transcribeRealtime() Deprecated

Transcribe audio from the device microphone in real-time.
This method is deprecated. Use RealtimeTranscriber instead for enhanced features including VAD auto-slicing and better memory management.
transcribeRealtime(
  options?: TranscribeRealtimeOptions
): Promise<{
  stop: () => Promise<void>,
  subscribe: (callback: (event: TranscribeRealtimeEvent) => void) => void
}>

Parameters

options
TranscribeRealtimeOptions
Extends TranscribeFileOptions with additional realtime options.

Example

const { stop, subscribe } = await whisperContext.transcribeRealtime({
  language: 'en',
  realtimeAudioSec: 60,
  realtimeAudioSliceSec: 25,
  audioOutputPath: '/path/to/save/recording.wav',
})

subscribe((event) => {
  const { isCapturing, data, processTime, recordingTime } = event
  console.log(`Capturing: ${isCapturing}`)
  console.log(`Result: ${data?.result}`)
  
  if (!isCapturing) {
    console.log('Finished recording')
  }
})

// Stop after 10 seconds
setTimeout(() => stop(), 10000)

bench()

Run a benchmark test to measure model performance.
bench(maxThreads: number): Promise<BenchResult>

Parameters

maxThreads
number
required
Maximum number of threads to use for the benchmark

Returns

BenchResult
object

Example

const benchResult = await whisperContext.bench(4)
console.log('Benchmark results:')
console.log(`  Threads: ${benchResult.nThreads}`)
console.log(`  Encode: ${benchResult.encodeMs}ms`)
console.log(`  Decode: ${benchResult.decodeMs}ms`)

release()

Release the context and free its memory.
release(): Promise<void>
Always release contexts when done to prevent memory leaks. Contexts hold significant native memory.

Example

try {
  // Use the context
  const { promise } = whisperContext.transcribe(audioFile)
  await promise
} finally {
  // Always release
  await whisperContext.release()
}

Audio Format Requirements

For all transcription methods:
  • Sample rate: 16kHz (required by Whisper)
  • Channels: Mono (1 channel)
  • Format: 16-bit PCM
  • Supported inputs: WAV files, base64 WAV, raw PCM data, ArrayBuffer

Error Handling

try {
  const { promise } = whisperContext.transcribe(audioFile, {
    language: 'en',
  })
  const { result } = await promise
} catch (error) {
  if (error.message.includes('Invalid asset')) {
    console.error('Audio file not found')
  } else if (error.message.includes('remote file')) {
    console.error('Cannot transcribe remote URL')
  } else {
    console.error('Transcription failed:', error)
  }
}

Performance Tips

  • Use maxLen: 1 for better real-time performance
  • Enable tokenTimestamps only when needed (adds overhead)
  • Adjust maxThreads based on device capabilities
  • Use beamSize for better accuracy (slower than greedy search)
  • Test in Release mode for accurate timing

See Also

Build docs developers (and LLMs) love