This guide helps you migrate from the deprecated transcribeRealtime() method to the modern RealtimeTranscriber class.
Why Migrate?
The RealtimeTranscriber provides significant improvements over the legacy transcribeRealtime() API:
Key improvements:
Better VAD (Voice Activity Detection) integration with auto-slicing
Improved memory management with configurable slice limits
More flexible audio stream adapters
Queue-based transcription processing
Enhanced stats and monitoring
Better error handling
Prompt chaining for improved context
The transcribeRealtime() method is deprecated and will show a warning. While it still works, it lacks the advanced features and optimizations of RealtimeTranscriber.
Quick Comparison
Legacy API (transcribeRealtime)
const { stop , subscribe } = await whisperContext . transcribeRealtime ({
language: 'en' ,
realtimeAudioSec: 30 ,
realtimeAudioSliceSec: 30 ,
audioOutputPath: '/path/to/output.wav' ,
})
subscribe (( event ) => {
if ( event . isCapturing ) {
console . log ( 'Capturing:' , event . data ?. result )
} else {
console . log ( 'Final:' , event . data ?. result )
}
})
// Later
await stop ()
Modern API (RealtimeTranscriber)
import { RealtimeTranscriber } from 'whisper.rn'
import { AudioPcmStreamAdapter } from 'whisper.rn/realtime-transcription'
const audioStream = new AudioPcmStreamAdapter ()
const transcriber = new RealtimeTranscriber (
{ whisperContext , vadContext , audioStream },
{
audioSliceSec: 30 ,
transcribeOptions: { language: 'en' },
audioOutputPath: '/path/to/output.wav' ,
},
{
onTranscribe : ( event ) => {
console . log ( 'Result:' , event . result )
},
}
)
await transcriber . start ()
// Later
await transcriber . stop ()
Step-by-Step Migration
Install audio stream dependency
The RealtimeTranscriber requires an audio stream adapter. Install the recommended adapter: npm install @fugood/react-native-audio-pcm-stream
# or
yarn add @fugood/react-native-audio-pcm-stream
iOS setup: Android setup - No additional steps needed.
Initialize VAD context (optional but recommended)
While VAD is optional, it significantly improves realtime transcription: import { initWhisperVad } from 'whisper.rn'
const vadContext = await initWhisperVad ({
filePath: require ( './models/silero_vad.onnx' ),
useGpu: true ,
})
Download the Silero VAD model from the whisper.rn repository .
Create audio stream adapter
import { AudioPcmStreamAdapter } from 'whisper.rn/realtime-transcription'
const audioStream = new AudioPcmStreamAdapter ()
Migrate options and callbacks
Update your configuration to use the new API structure: Before (transcribeRealtime): const options = {
language: 'en' ,
translate: false ,
maxLen: 1 ,
maxThreads: 4 ,
realtimeAudioSec: 30 ,
realtimeAudioSliceSec: 30 ,
realtimeAudioMinSec: 1 ,
audioOutputPath: outputPath ,
useVad: true ,
vadMs: 2000 ,
vadThold: 0.6 ,
vadFreqThold: 100 ,
}
After (RealtimeTranscriber): import { RealtimeTranscriber } from 'whisper.rn'
const transcriber = new RealtimeTranscriber (
{
whisperContext ,
vadContext , // VAD now uses dedicated context
audioStream ,
},
{
// Slice configuration
audioSliceSec: 30 ,
audioMinSec: 1 ,
maxSlicesInMemory: 3 ,
// Transcription options
transcribeOptions: {
language: 'en' ,
translate: false ,
maxLen: 1 ,
maxThreads: 4 ,
},
// Output
audioOutputPath: outputPath ,
// Prompt configuration
initialPrompt: 'Your initial prompt here' ,
promptPreviousSlices: true ,
},
{
// Callbacks (see next step)
}
)
Update event handling
Before (transcribeRealtime): subscribe (( event ) => {
if ( event . isCapturing ) {
// Realtime updates while recording
console . log ( 'Progress:' , event . data ?. result )
} else {
// Final result
console . log ( 'Final:' , event . data ?. result )
console . log ( 'Process time:' , event . processTime )
}
})
After (RealtimeTranscriber): const transcriber = new RealtimeTranscriber (
{ whisperContext , vadContext , audioStream },
{ /* options */ },
{
onTranscribe : ( event ) => {
console . log ( 'Transcription:' , {
slice: event . slice ,
result: event . result ,
isFinal: event . isFinal ,
processTime: event . processTime ,
})
},
onVad : ( event ) => {
console . log ( 'VAD event:' , event . event ) // speech_start, speech_continue, speech_end
},
onStats : ( stats ) => {
console . log ( 'Stats:' , {
slicesInMemory: stats . memoryUsage . slicesInMemory ,
currentSliceAudioSec: stats . currentSlice . audioSec ,
})
},
onError : ( error ) => {
console . error ( 'Error:' , error )
},
onStatusChange : ( isActive ) => {
console . log ( 'Status:' , isActive ? 'Recording' : 'Stopped' )
},
}
)
Update start/stop logic
Before: const { stop , subscribe } = await whisperContext . transcribeRealtime ( options )
subscribe ( handleEvent )
// Later
await stop ()
After: const transcriber = new RealtimeTranscriber (
{ whisperContext , vadContext , audioStream },
options ,
callbacks // Events are passed during construction
)
await transcriber . start ()
// Later
await transcriber . stop ()
Clean up resources
Don’t forget to release all contexts: try {
await transcriber . start ()
// ... transcription logic
await transcriber . stop ()
} finally {
// Release contexts
await whisperContext . release ()
await vadContext ?. release ()
}
Complete Migration Example
Before: Using transcribeRealtime
import { initWhisper } from 'whisper.rn'
import { PermissionsAndroid , Platform } from 'react-native'
// Request microphone permission (Android)
if ( Platform . OS === 'android' ) {
await PermissionsAndroid . request (
PermissionsAndroid . PERMISSIONS . RECORD_AUDIO
)
}
// Initialize context
const whisperContext = await initWhisper ({
filePath: require ( './models/ggml-base.en.bin' ),
})
// Start realtime transcription
const { stop , subscribe } = await whisperContext . transcribeRealtime ({
language: 'en' ,
realtimeAudioSec: 30 ,
realtimeAudioSliceSec: 30 ,
audioOutputPath: ` ${ RNFS . DocumentDirectoryPath } /recording.wav` ,
useVad: true ,
vadThold: 0.6 ,
})
// Handle events
subscribe (( event ) => {
if ( event . isCapturing ) {
setTranscriptionText ( event . data ?. result || '' )
} else {
setFinalResult ( event . data ?. result || '' )
}
})
// Stop and cleanup
const handleStop = async () => {
await stop ()
await whisperContext . release ()
}
After: Using RealtimeTranscriber
import { initWhisper , initWhisperVad } from 'whisper.rn'
import { RealtimeTranscriber } from 'whisper.rn'
import { AudioPcmStreamAdapter } from 'whisper.rn/realtime-transcription'
import { PermissionsAndroid , Platform } from 'react-native'
import RNFS from 'react-native-fs'
// Request microphone permission (Android)
if ( Platform . OS === 'android' ) {
await PermissionsAndroid . request (
PermissionsAndroid . PERMISSIONS . RECORD_AUDIO
)
}
// Initialize contexts
const whisperContext = await initWhisper ({
filePath: require ( './models/ggml-base.en.bin' ),
})
const vadContext = await initWhisperVad ({
filePath: require ( './models/silero_vad.onnx' ),
})
// Create audio stream
const audioStream = new AudioPcmStreamAdapter ()
// Create transcriber
const transcriber = new RealtimeTranscriber (
{ whisperContext , vadContext , audioStream },
{
audioSliceSec: 30 ,
maxSlicesInMemory: 3 ,
transcribeOptions: {
language: 'en' ,
},
audioOutputPath: ` ${ RNFS . DocumentDirectoryPath } /recording.wav` ,
},
{
onTranscribe : ( event ) => {
if ( event . isFinal ) {
setFinalResult ( event . result )
} else {
setTranscriptionText ( event . result )
}
},
onVad : ( event ) => {
console . log ( 'VAD:' , event . event )
},
onStats : ( stats ) => {
setStats ( stats )
},
onError : ( error ) => {
console . error ( 'Transcription error:' , error )
},
}
)
// Start transcription
await transcriber . start ()
// Stop and cleanup
const handleStop = async () => {
await transcriber . stop ()
await whisperContext . release ()
await vadContext . release ()
}
Key Differences
Audio Session Management (iOS)
Old API:
await whisperContext . transcribeRealtime ({
audioSessionOnStartIos: {
category: 'playAndRecord' ,
options: [ 'defaultToSpeaker' ],
mode: 'default' ,
},
audioSessionOnStopIos: 'restore' ,
})
New API:
Handle audio session manually using AudioSessionIos:
import { AudioSessionIos } from 'whisper.rn'
// Before starting
await AudioSessionIos . setCategory ( 'playAndRecord' , [ 'defaultToSpeaker' ])
await AudioSessionIos . setMode ( 'default' )
await AudioSessionIos . setActive ( true )
const transcriber = new RealtimeTranscriber ( /* ... */ )
await transcriber . start ()
// After stopping
await transcriber . stop ()
await AudioSessionIos . setActive ( false )
VAD Configuration
Old API:
VAD was configured through options with limited control:
await whisperContext . transcribeRealtime ({
useVad: true ,
vadMs: 2000 ,
vadThold: 0.6 ,
vadFreqThold: 100 ,
})
New API:
VAD now uses a dedicated context with more configuration options:
import { initWhisperVad , VAD_PRESETS } from 'whisper.rn'
const vadContext = await initWhisperVad ({
filePath: require ( './models/silero_vad.onnx' ),
useGpu: true ,
})
// Use presets or custom configuration
const preset = VAD_PRESETS . QUALITY // or FAST, BALANCED
const realtimeVadContext = new RingBufferVad (
vadContext ,
preset // Includes all VAD thresholds and settings
)
const transcriber = new RealtimeTranscriber (
{ whisperContext , vadContext: realtimeVadContext , audioStream },
{ /* options */ }
)
Memory Management
Old API: Limited control over memory usage
New API: Fine-grained control:
const transcriber = new RealtimeTranscriber (
{ whisperContext , vadContext , audioStream },
{
maxSlicesInMemory: 3 , // Keep only last 3 slices
},
{
onStats : ( stats ) => {
console . log ( 'Memory usage:' , {
slicesInMemory: stats . memoryUsage . slicesInMemory ,
totalBytes: stats . memoryUsage . totalBytes ,
oldestSlice: stats . memoryUsage . oldestSlice ,
newestSlice: stats . memoryUsage . newestSlice ,
})
},
}
)
Troubleshooting Migration
Error: AudioPcmStreamAdapter not found
Make sure you’ve installed the audio stream dependency: yarn add @fugood/react-native-audio-pcm-stream
cd ios && pod install
Import the adapter: import { AudioPcmStreamAdapter } from 'whisper.rn/realtime-transcription'
Ensure you:
Initialized VAD context:
const vadContext = await initWhisperVad ({
filePath: require ( './models/silero_vad.onnx' ),
})
Downloaded the VAD model file from the example app
Passed VAD context to transcriber:
new RealtimeTranscriber (
{ whisperContext , vadContext , audioStream },
{ /* options */ }
)
Make sure you’re passing callbacks during construction, not after: // ✅ Correct
const transcriber = new RealtimeTranscriber (
dependencies ,
options ,
{
onTranscribe : ( event ) => { /* ... */ },
onVad : ( event ) => { /* ... */ },
}
)
// ❌ Wrong - callbacks cannot be added after construction
const transcriber = new RealtimeTranscriber ( dependencies , options )
transcriber . onTranscribe = ( event ) => { /* Won't work */ }
Ensure you:
Installed a filesystem library:
Passed fs dependency:
import RNFS from 'react-native-fs'
new RealtimeTranscriber (
{ whisperContext , vadContext , audioStream , fs: RNFS },
{ audioOutputPath: ` ${ RNFS . DocumentDirectoryPath } /output.wav` }
)
Benefits After Migration
Once migrated, you’ll have access to:
Better VAD : More accurate speech detection with configurable presets
Memory control : Limit slices in memory to prevent crashes
Prompt chaining : Context from previous slices improves transcription continuity
Stats monitoring : Real-time stats for debugging and optimization
Flexible adapters : Custom audio stream sources
Queue processing : Controlled transcription processing
Better errors : More detailed error reporting
See the Realtime Transcription guide for advanced features and usage patterns.