Overview
The WhisperContext class provides methods for transcribing audio files and data. It is created by calling initWhisper().
Properties
Unique identifier for this context
Native context pointer (for internal use)
Whether GPU/Metal acceleration is active for this context
Explanation if GPU is not available (empty string if GPU is active)
Methods
transcribe()
Transcribe an audio file or base64-encoded WAV data.
transcribe(
filePathOrBase64: string | number,
options?: TranscribeFileOptions
): { stop: () => Promise<void>, promise: Promise<TranscribeResult> }
Parameters
Audio file path, base64 WAV data, or asset require.Supported formats:
- File path:
'/path/to/audio.wav'
- Asset:
require('../assets/sample.wav')
- Base64 WAV:
'data:audio/wav;base64,...'
file:// URI (automatically normalized)
Remote URLs are not supported. Download the file first.
Transcription options.
Spoken language code (e.g., 'en', 'es', 'fr'). Use 'auto' for auto-detection.
Translate from source language to English
Number of threads for computation. Default: 2 for 4-core devices, 4 for more cores.
Maximum segment length in characters
Enable word-level timestamps
Time offset in milliseconds to start transcribing from
Duration of audio to process in milliseconds
Initial decoding temperature for sampling
Beam size for beam search (slower but more accurate than greedy)
Initial prompt to guide transcription
onProgress
(progress: number) => void
Progress callback, receives values from 0-100
onNewSegments
(result: TranscribeNewSegmentsResult) => void
Callback when new segments are transcribedShow TranscribeNewSegmentsResult
Number of new segments in this batch
Total number of new segments so far
Accumulated transcription text
Returns
Function to abort the transcription
promise
Promise<TranscribeResult>
Promise that resolves with the transcription result
The complete transcribed text
Detected or specified language code
Array of transcribed segments with timestamps
Start time in milliseconds
Whether the transcription was aborted
Example
const { stop, promise } = whisperContext.transcribe(
require('../assets/sample.wav'),
{
language: 'en',
maxLen: 1,
tokenTimestamps: true,
onProgress: (progress) => {
console.log(`Progress: ${progress}%`)
},
}
)
const { result, segments } = await promise
console.log('Transcription:', result)
segments.forEach((segment) => {
console.log(`[${segment.t0}ms -> ${segment.t1}ms] ${segment.text}`)
})
transcribeData()
Transcribe raw audio data from base64-encoded float32 PCM or ArrayBuffer.
transcribeData(
data: string | ArrayBuffer,
options?: TranscribeFileOptions
): { stop: () => Promise<void>, promise: Promise<TranscribeResult> }
Parameters
data
string | ArrayBuffer
required
Raw audio data. Must be 16-bit PCM, mono, 16kHz.
- string: Base64-encoded float32 PCM data
- ArrayBuffer: Raw 16-bit PCM data (uses JSI for efficient transfer)
Same options as transcribe()
Returns
Same return type as transcribe().
Example
// Using ArrayBuffer (recommended for performance)
const audioBuffer = new Int16Array([...]).buffer
const { promise } = whisperContext.transcribeData(audioBuffer, {
language: 'en',
})
const { result } = await promise
// Using base64 PCM data
const base64Data = 'base64EncodedFloat32PCM...'
const { promise: promise2 } = whisperContext.transcribeData(base64Data)
const { result: result2 } = await promise2
ArrayBuffer transcription uses JSI bindings for efficient memory transfer, avoiding JSON serialization overhead.
transcribeRealtime() Deprecated
Transcribe audio from the device microphone in real-time.
This method is deprecated. Use RealtimeTranscriber instead for enhanced features including VAD auto-slicing and better memory management.
transcribeRealtime(
options?: TranscribeRealtimeOptions
): Promise<{
stop: () => Promise<void>,
subscribe: (callback: (event: TranscribeRealtimeEvent) => void) => void
}>
Parameters
options
TranscribeRealtimeOptions
Extends TranscribeFileOptions with additional realtime options.
Realtime record max duration in seconds. Due to whisper.cpp processing audio in 30-second chunks, values ≤ 30 are recommended.
Audio slice duration for processing. Set < 30 for performance improvements. Default: equal to realtimeAudioSec.
Minimum audio duration to start transcription (0.5ms - realtimeAudioSliceSec)
Output path to save recorded audio file. If not set, audio is not saved.
Enable Voice Activity Detection to start transcription when speech is detected
Length of audio collected for VAD (minimum 2000ms)
iOS audio session settings when starting transcription
audioSessionOnStopIos
string | AudioSessionSettingIos
iOS audio session settings when stopping. Use 'restore' to restore previous state.
Example
const { stop, subscribe } = await whisperContext.transcribeRealtime({
language: 'en',
realtimeAudioSec: 60,
realtimeAudioSliceSec: 25,
audioOutputPath: '/path/to/save/recording.wav',
})
subscribe((event) => {
const { isCapturing, data, processTime, recordingTime } = event
console.log(`Capturing: ${isCapturing}`)
console.log(`Result: ${data?.result}`)
if (!isCapturing) {
console.log('Finished recording')
}
})
// Stop after 10 seconds
setTimeout(() => stop(), 10000)
bench()
Run a benchmark test to measure model performance.
bench(maxThreads: number): Promise<BenchResult>
Parameters
Maximum number of threads to use for the benchmark
Returns
Model configuration string
Encoder time in milliseconds
Decoder time in milliseconds
Batch processing time in milliseconds
Prompt processing time in milliseconds
Example
const benchResult = await whisperContext.bench(4)
console.log('Benchmark results:')
console.log(` Threads: ${benchResult.nThreads}`)
console.log(` Encode: ${benchResult.encodeMs}ms`)
console.log(` Decode: ${benchResult.decodeMs}ms`)
release()
Release the context and free its memory.
Always release contexts when done to prevent memory leaks. Contexts hold significant native memory.
Example
try {
// Use the context
const { promise } = whisperContext.transcribe(audioFile)
await promise
} finally {
// Always release
await whisperContext.release()
}
For all transcription methods:
- Sample rate: 16kHz (required by Whisper)
- Channels: Mono (1 channel)
- Format: 16-bit PCM
- Supported inputs: WAV files, base64 WAV, raw PCM data, ArrayBuffer
Error Handling
try {
const { promise } = whisperContext.transcribe(audioFile, {
language: 'en',
})
const { result } = await promise
} catch (error) {
if (error.message.includes('Invalid asset')) {
console.error('Audio file not found')
} else if (error.message.includes('remote file')) {
console.error('Cannot transcribe remote URL')
} else {
console.error('Transcription failed:', error)
}
}
- Use
maxLen: 1 for better real-time performance
- Enable
tokenTimestamps only when needed (adds overhead)
- Adjust
maxThreads based on device capabilities
- Use
beamSize for better accuracy (slower than greedy search)
- Test in Release mode for accurate timing
See Also