Skip to main content

Overview

RealtimeTranscriber provides real-time audio transcription with Voice Activity Detection (VAD) support. It automatically manages audio slices, detects speech segments, and processes transcriptions in a queue-based system. Key Features:
  • Automatic slice management based on duration
  • VAD-based speech detection and auto-slicing
  • Configurable auto-slice mechanism that triggers on speech_end/silence events
  • Memory management for audio slices
  • Queue-based transcription processing
  • Prompt chaining from previous transcriptions

Constructor

new RealtimeTranscriber(
  dependencies: RealtimeTranscriberDependencies,
  options?: RealtimeOptions,
  callbacks?: RealtimeTranscriberCallbacks
)

Parameters

dependencies
RealtimeTranscriberDependencies
required
Required dependencies for the transcriber
options
RealtimeOptions
Configuration options for realtime transcription
callbacks
RealtimeTranscriberCallbacks
Event callbacks for transcription events

Methods

start()

Starts real-time transcription.
await transcriber.start(): Promise<void>
Throws an error if transcription is already active. Initializes the audio stream and begins processing.

stop()

Stops real-time transcription.
await transcriber.stop(): Promise<void>
Stops the audio stream, processes remaining queued transcriptions, waits for active transcriptions to complete, and releases resources.

updateCallbacks()

Updates event callbacks dynamically.
transcriber.updateCallbacks(callbacks: Partial<RealtimeTranscriberCallbacks>): void
callbacks
Partial<RealtimeTranscriberCallbacks>
required
Callbacks to update (merged with existing callbacks)

updateVadOptions()

Updates VAD options dynamically (if VAD context is available).
transcriber.updateVadOptions(options: Partial<VadOptions>): void
options
Partial<VadOptions>
required
VAD options to update (e.g., threshold, minSpeechDurationMs, etc.)

getStatistics()

Returns current statistics about the transcription session.
transcriber.getStatistics(): Statistics
isActive
boolean
Whether transcription is currently active
isTranscribing
boolean
Whether currently processing a transcription
vadEnabled
boolean
Whether VAD is enabled
audioStats
object
vadStats
object | null
sliceStats
object
Slice manager statistics (see SliceManager.getCurrentSliceInfo())

getTranscriptionResults()

Returns all transcription results from completed slices.
transcriber.getTranscriptionResults(): Array<{
  slice: AudioSliceNoData,
  transcribeEvent: RealtimeTranscribeEvent
}>
Array
Array<object>
Array of transcription results

nextSlice()

Forces move to the next audio slice, finalizing the current one regardless of capacity.
await transcriber.nextSlice(): Promise<void>
Useful for manually triggering transcription of the current audio buffer.

reset()

Resets all internal state (slices, queues, transcription results).
transcriber.reset(): void
This does not stop the transcription - use stop() first if needed.

release()

Releases all resources and cleans up.
await transcriber.release(): Promise<void>
Stops transcription if active and releases the audio stream and VAD context.

Event Types

RealtimeTranscribeEvent

type
'start' | 'transcribe' | 'end' | 'error'
Event type
sliceIndex
number
Index of the audio slice being transcribed
data
TranscribeResult
Transcription result (only for ‘transcribe’ type)
isCapturing
boolean
Whether audio stream is currently recording
processTime
number
Time taken to process transcription in milliseconds
recordingTime
number
Duration of the audio being transcribed in milliseconds
memoryUsage
MemoryUsage
Current memory usage statistics
vadEvent
RealtimeVadEvent
Associated VAD event if available

RealtimeVadEvent

type
'speech_start' | 'speech_end' | 'speech_continue' | 'silence'
VAD event type
timestamp
number
Event timestamp
lastSpeechDetectedTime
number
Timestamp when speech was last detected
confidence
number
VAD confidence score
duration
number
Duration of the audio segment in seconds
sliceIndex
number
Associated slice index

Example Usage

import { RealtimeTranscriber } from 'whisper.rn'
import { initWhisper } from 'whisper.rn'

const whisperContext = await initWhisper({
  filePath: 'ggml-base.en.bin'
})

const transcriber = new RealtimeTranscriber(
  {
    whisperContext,
    audioStream: myAudioStreamAdapter,
    vadContext: myVadContext, // Optional
  },
  {
    audioSliceSec: 30,
    maxSlicesInMemory: 3,
    initialPrompt: 'Technical discussion about AI',
  },
  {
    onTranscribe: (event) => {
      if (event.type === 'transcribe') {
        console.log('Transcription:', event.data?.result)
      }
    },
    onVad: (event) => {
      console.log('VAD event:', event.type, event.confidence)
    },
    onError: (error) => {
      console.error('Error:', error)
    },
  }
)

// Start transcription
await transcriber.start()

// Later: stop transcription
await transcriber.stop()

// Get all results
const results = transcriber.getTranscriptionResults()

// Release resources
await transcriber.release()

Build docs developers (and LLMs) love