Overview
RealtimeTranscriber provides real-time audio transcription with Voice Activity Detection (VAD) support. It automatically manages audio slices, detects speech segments, and processes transcriptions in a queue-based system.
Key Features:
Automatic slice management based on duration
VAD-based speech detection and auto-slicing
Configurable auto-slice mechanism that triggers on speech_end/silence events
Memory management for audio slices
Queue-based transcription processing
Prompt chaining from previous transcriptions
Constructor
new RealtimeTranscriber (
dependencies : RealtimeTranscriberDependencies ,
options ?: RealtimeOptions ,
callbacks ?: RealtimeTranscriberCallbacks
)
Parameters
dependencies
RealtimeTranscriberDependencies
required
Required dependencies for the transcriber Show RealtimeTranscriberDependencies
whisperContext
WhisperContextLike
required
Whisper context for transcription. Must implement transcribeData(data: ArrayBuffer, options: TranscribeOptions) method.
Optional VAD context for speech detection. If provided, enables automatic speech detection and slicing.
audioStream
AudioStreamInterface
required
Audio stream interface that provides audio data. Must implement methods like initialize(), start(), stop(), onData(), etc.
Optional filesystem interface for writing WAV files. Required if audioOutputPath is specified in options.
Configuration options for realtime transcription Duration of each audio slice in seconds
Minimum audio duration in seconds before transcription
Maximum number of audio slices to keep in memory. Older slices are automatically released.
Options to pass to whisper transcription (e.g., language, translate, etc.)
Initial prompt text to guide transcription
Whether to include previous slice transcriptions as context in the prompt
Path to write audio to a WAV file (requires fs dependency)
Audio stream configuration (sampleRate, channels, bitsPerSample, bufferSize, audioSource)
realtimeProcessingPauseMs
Minimum interval in milliseconds between realtime transcription updates
Wait time in milliseconds before starting the first realtime transcription
logger
(message: string) => void
Custom logger function. Defaults to no-op.
callbacks
RealtimeTranscriberCallbacks
Event callbacks for transcription events Show RealtimeTranscriberCallbacks
onBeginTranscribe
(sliceInfo) => Promise<boolean>
Called before transcription starts. Return false to skip transcription for this slice. Receives: { audioData: Uint8Array, sliceIndex: number, duration: number, vadEvent?: RealtimeVadEvent }
onTranscribe
(event: RealtimeTranscribeEvent) => void
Called when transcription completes or starts
onBeginVad
(sliceInfo) => Promise<boolean>
Called before VAD processing. Return false to skip VAD for this audio chunk. Receives: { audioData: Uint8Array, sliceIndex: number, duration: number }
onVad
(event: RealtimeVadEvent) => void
Called when VAD detects speech events (start, continue, end, silence)
Called when an error occurs
onStatusChange
(isActive: boolean) => void
Called when transcription status changes (started/stopped)
onStatsUpdate
(event: RealtimeStatsEvent) => void
Called when statistics update (memory usage, slice counts, etc.)
onSliceTranscriptionStabilized
Called when a final transcription for a slice is ready (after speech_end)
Methods
start()
Starts real-time transcription.
await transcriber . start (): Promise < void >
Throws an error if transcription is already active. Initializes the audio stream and begins processing.
stop()
Stops real-time transcription.
await transcriber . stop (): Promise < void >
Stops the audio stream, processes remaining queued transcriptions, waits for active transcriptions to complete, and releases resources.
updateCallbacks()
Updates event callbacks dynamically.
transcriber . updateCallbacks ( callbacks : Partial < RealtimeTranscriberCallbacks > ): void
callbacks
Partial<RealtimeTranscriberCallbacks>
required
Callbacks to update (merged with existing callbacks)
updateVadOptions()
Updates VAD options dynamically (if VAD context is available).
transcriber . updateVadOptions ( options : Partial < VadOptions > ): void
options
Partial<VadOptions>
required
VAD options to update (e.g., threshold, minSpeechDurationMs, etc.)
getStatistics()
Returns current statistics about the transcription session.
transcriber . getStatistics (): Statistics
Whether transcription is currently active
Whether currently processing a transcription
Whether audio stream is recording
Total samples accumulated
Show VAD statistics (null if VAD disabled)
Whether VAD context is available
Timestamp of last detected speech
Slice manager statistics (see SliceManager.getCurrentSliceInfo())
getTranscriptionResults()
Returns all transcription results from completed slices.
transcriber . getTranscriptionResults (): Array <{
slice : AudioSliceNoData ,
transcribeEvent : RealtimeTranscribeEvent
}>
Array of transcription results Slice metadata (without audio data): index, sampleCount, startTime, endTime, isProcessed, isReleased
Transcription event with result data
nextSlice()
Forces move to the next audio slice, finalizing the current one regardless of capacity.
await transcriber . nextSlice (): Promise < void >
Useful for manually triggering transcription of the current audio buffer.
reset()
Resets all internal state (slices, queues, transcription results).
transcriber . reset (): void
This does not stop the transcription - use stop() first if needed.
release()
Releases all resources and cleans up.
await transcriber . release (): Promise < void >
Stops transcription if active and releases the audio stream and VAD context.
Event Types
RealtimeTranscribeEvent
type
'start' | 'transcribe' | 'end' | 'error'
Event type
Index of the audio slice being transcribed
Transcription result (only for ‘transcribe’ type)
Whether audio stream is currently recording
Time taken to process transcription in milliseconds
Duration of the audio being transcribed in milliseconds
Current memory usage statistics
Associated VAD event if available
RealtimeVadEvent
type
'speech_start' | 'speech_end' | 'speech_continue' | 'silence'
VAD event type
Timestamp when speech was last detected
Duration of the audio segment in seconds
Example Usage
import { RealtimeTranscriber } from 'whisper.rn'
import { initWhisper } from 'whisper.rn'
const whisperContext = await initWhisper ({
filePath: 'ggml-base.en.bin'
})
const transcriber = new RealtimeTranscriber (
{
whisperContext ,
audioStream: myAudioStreamAdapter ,
vadContext: myVadContext , // Optional
},
{
audioSliceSec: 30 ,
maxSlicesInMemory: 3 ,
initialPrompt: 'Technical discussion about AI' ,
},
{
onTranscribe : ( event ) => {
if ( event . type === 'transcribe' ) {
console . log ( 'Transcription:' , event . data ?. result )
}
},
onVad : ( event ) => {
console . log ( 'VAD event:' , event . type , event . confidence )
},
onError : ( error ) => {
console . error ( 'Error:' , error )
},
}
)
// Start transcription
await transcriber . start ()
// Later: stop transcription
await transcriber . stop ()
// Get all results
const results = transcriber . getTranscriptionResults ()
// Release resources
await transcriber . release ()