WhisperContext

Overview

The WhisperContext class provides methods for transcribing audio files and data. It is created by calling initWhisper().

Properties

number

Unique identifier for this context

ptr

number

Native context pointer (for internal use)

gpu

boolean

Whether GPU/Metal acceleration is active for this context

reasonNoGPU

string

Explanation if GPU is not available (empty string if GPU is active)

Methods

transcribe()

Transcribe an audio file or base64-encoded WAV data.

transcribe(
  filePathOrBase64: string | number,
  options?: TranscribeFileOptions
): { stop: () => Promise<void>, promise: Promise<TranscribeResult> }

Parameters

filePathOrBase64

string | number

required

Audio file path, base64 WAV data, or asset require.Supported formats:

File path: '/path/to/audio.wav'
Asset: require('../assets/sample.wav')
Base64 WAV: 'data:audio/wav;base64,...'
file:// URI (automatically normalized)

Remote URLs are not supported. Download the file first.

options

TranscribeFileOptions

Transcription options.

Show properties

language

string

default:"'auto'"

Spoken language code (e.g., 'en', 'es', 'fr'). Use 'auto' for auto-detection.

translate

boolean

default:"false"

Translate from source language to English

maxThreads

number

default:"2-4"

Number of threads for computation. Default: 2 for 4-core devices, 4 for more cores.

maxLen

number

Maximum segment length in characters

tokenTimestamps

boolean

default:"false"

Enable word-level timestamps

offset

number

Time offset in milliseconds to start transcribing from

duration

number

Duration of audio to process in milliseconds

temperature

number

default:"0"

Initial decoding temperature for sampling

beamSize

number

Beam size for beam search (slower but more accurate than greedy)

prompt

string

Initial prompt to guide transcription

onProgress

(progress: number) => void

Progress callback, receives values from 0-100

onNewSegments

(result: TranscribeNewSegmentsResult) => void

Callback when new segments are transcribed

Show TranscribeNewSegmentsResult

nNew

number

Number of new segments in this batch

totalNNew

number

Total number of new segments so far

result

string

Accumulated transcription text

segments

Array<Segment>

Array of new segments

Returns

stop

() => Promise<void>

Function to abort the transcription

promise

Promise<TranscribeResult>

Promise that resolves with the transcription result

Show TranscribeResult

result

string

The complete transcribed text

language

string

Detected or specified language code

segments

Array<Segment>

Array of transcribed segments with timestamps

Show Segment

text

string

Segment text

number

Start time in milliseconds

number

End time in milliseconds

isAborted

boolean

Whether the transcription was aborted

Example

const { stop, promise } = whisperContext.transcribe(
  require('../assets/sample.wav'),
  {
    language: 'en',
    maxLen: 1,
    tokenTimestamps: true,
    onProgress: (progress) => {
      console.log(`Progress: ${progress}%`)
    },
  }
)

const { result, segments } = await promise
console.log('Transcription:', result)

segments.forEach((segment) => {
  console.log(`[${segment.t0}ms -> ${segment.t1}ms] ${segment.text}`)
})

transcribeData()

Transcribe raw audio data from base64-encoded float32 PCM or ArrayBuffer.

transcribeData(
  data: string | ArrayBuffer,
  options?: TranscribeFileOptions
): { stop: () => Promise<void>, promise: Promise<TranscribeResult> }

Parameters

data

string | ArrayBuffer

required

Raw audio data. Must be 16-bit PCM, mono, 16kHz.

string: Base64-encoded float32 PCM data
ArrayBuffer: Raw 16-bit PCM data (uses JSI for efficient transfer)

options

TranscribeFileOptions

Same options as transcribe()

Returns

Same return type as transcribe().

Example

// Using ArrayBuffer (recommended for performance)
const audioBuffer = new Int16Array([...]).buffer
const { promise } = whisperContext.transcribeData(audioBuffer, {
  language: 'en',
})
const { result } = await promise

// Using base64 PCM data
const base64Data = 'base64EncodedFloat32PCM...'
const { promise: promise2 } = whisperContext.transcribeData(base64Data)
const { result: result2 } = await promise2

ArrayBuffer transcription uses JSI bindings for efficient memory transfer, avoiding JSON serialization overhead.

transcribeRealtime() Deprecated

Transcribe audio from the device microphone in real-time.

This method is deprecated. Use RealtimeTranscriber instead for enhanced features including VAD auto-slicing and better memory management.

transcribeRealtime(
  options?: TranscribeRealtimeOptions
): Promise<{
  stop: () => Promise<void>,
  subscribe: (callback: (event: TranscribeRealtimeEvent) => void) => void
}>

Parameters

options

TranscribeRealtimeOptions

Extends TranscribeFileOptions with additional realtime options.

Show properties

realtimeAudioSec

number

default:"30"

Realtime record max duration in seconds. Due to whisper.cpp processing audio in 30-second chunks, values ≤ 30 are recommended.

realtimeAudioSliceSec

number

Audio slice duration for processing. Set < 30 for performance improvements. Default: equal to realtimeAudioSec.

realtimeAudioMinSec

number

default:"1"

Minimum audio duration to start transcription (0.5ms - realtimeAudioSliceSec)

audioOutputPath

string

Output path to save recorded audio file. If not set, audio is not saved.

useVad

boolean

default:"false"

Enable Voice Activity Detection to start transcription when speech is detected

vadMs

number

default:"2000"

Length of audio collected for VAD (minimum 2000ms)

vadThold

number

default:"0.6"

VAD threshold (0.0-1.0)

audioSessionOnStartIos

AudioSessionSettingIos

iOS audio session settings when starting transcription

audioSessionOnStopIos

string | AudioSessionSettingIos

iOS audio session settings when stopping. Use 'restore' to restore previous state.

Example

const { stop, subscribe } = await whisperContext.transcribeRealtime({
  language: 'en',
  realtimeAudioSec: 60,
  realtimeAudioSliceSec: 25,
  audioOutputPath: '/path/to/save/recording.wav',
})

subscribe((event) => {
  const { isCapturing, data, processTime, recordingTime } = event
  console.log(`Capturing: ${isCapturing}`)
  console.log(`Result: ${data?.result}`)
  
  if (!isCapturing) {
    console.log('Finished recording')
  }
})

// Stop after 10 seconds
setTimeout(() => stop(), 10000)

bench()

Run a benchmark test to measure model performance.

bench(maxThreads: number): Promise<BenchResult>

Parameters

maxThreads

number

required

Maximum number of threads to use for the benchmark

Returns

BenchResult

object

Show properties

config

string

Model configuration string

nThreads

number

Number of threads used

encodeMs

number

Encoder time in milliseconds

decodeMs

number

Decoder time in milliseconds

batchMs

number

Batch processing time in milliseconds

promptMs

number

Prompt processing time in milliseconds

Example

const benchResult = await whisperContext.bench(4)
console.log('Benchmark results:')
console.log(`  Threads: ${benchResult.nThreads}`)
console.log(`  Encode: ${benchResult.encodeMs}ms`)
console.log(`  Decode: ${benchResult.decodeMs}ms`)

release()

Release the context and free its memory.

release(): Promise<void>

Always release contexts when done to prevent memory leaks. Contexts hold significant native memory.

Example

try {
  // Use the context
  const { promise } = whisperContext.transcribe(audioFile)
  await promise
} finally {
  // Always release
  await whisperContext.release()
}

Audio Format Requirements

For all transcription methods:

Sample rate: 16kHz (required by Whisper)
Channels: Mono (1 channel)
Format: 16-bit PCM
Supported inputs: WAV files, base64 WAV, raw PCM data, ArrayBuffer

Error Handling

try {
  const { promise } = whisperContext.transcribe(audioFile, {
    language: 'en',
  })
  const { result } = await promise
} catch (error) {
  if (error.message.includes('Invalid asset')) {
    console.error('Audio file not found')
  } else if (error.message.includes('remote file')) {
    console.error('Cannot transcribe remote URL')
  } else {
    console.error('Transcription failed:', error)
  }
}

Performance Tips

Use maxLen: 1 for better real-time performance
Enable tokenTimestamps only when needed (adds overhead)
Adjust maxThreads based on device capabilities
Use beamSize for better accuracy (slower than greedy search)
Test in Release mode for accurate timing

Core API

Voice Activity Detection

Realtime Transcription

Types & Interfaces

Utilities

Overview

Properties

Methods

transcribe()

Parameters

Returns

Example

transcribeData()

Parameters

Returns

Example

transcribeRealtime() Deprecated

Parameters

Example

bench()

Parameters

Returns

Example

release()

Example

Audio Format Requirements

Error Handling

Performance Tips

See Also

Build docs developers (and LLMs) love

Core API

Voice Activity Detection

Realtime Transcription

Types & Interfaces

Utilities

​Overview

​Properties

​Methods

​transcribe()

​Parameters

​Returns

​Example

​transcribeData()

​Parameters

​Returns

​Example

​transcribeRealtime() Deprecated

​Parameters

​Example

​bench()

​Parameters

​Returns

​Example

​release()

​Example

​Audio Format Requirements

​Error Handling

​Performance Tips

​See Also

Build docs developers (and LLMs) love

Overview

Properties

Methods

transcribe()

Parameters

Returns

Example

transcribeData()

Parameters

Returns

Example

transcribeRealtime() Deprecated

Parameters

Example

bench()

Parameters

Returns

Example

release()

Example

Audio Format Requirements

Error Handling

Performance Tips

See Also