RealtimeTranscriber

Overview

RealtimeTranscriber provides real-time audio transcription with Voice Activity Detection (VAD) support. It automatically manages audio slices, detects speech segments, and processes transcriptions in a queue-based system. Key Features:

Automatic slice management based on duration
VAD-based speech detection and auto-slicing
Configurable auto-slice mechanism that triggers on speech_end/silence events
Memory management for audio slices
Queue-based transcription processing
Prompt chaining from previous transcriptions

Constructor

new RealtimeTranscriber(
  dependencies: RealtimeTranscriberDependencies,
  options?: RealtimeOptions,
  callbacks?: RealtimeTranscriberCallbacks
)

Parameters

dependencies

RealtimeTranscriberDependencies

required

Required dependencies for the transcriber

Show RealtimeTranscriberDependencies

whisperContext

WhisperContextLike

required

Whisper context for transcription. Must implement transcribeData(data: ArrayBuffer, options: TranscribeOptions) method.

vadContext

RealtimeVadContextLike

Optional VAD context for speech detection. If provided, enables automatic speech detection and slicing.

audioStream

AudioStreamInterface

required

Audio stream interface that provides audio data. Must implement methods like initialize(), start(), stop(), onData(), etc.

WavFileWriterFs

Optional filesystem interface for writing WAV files. Required if audioOutputPath is specified in options.

options

RealtimeOptions

Configuration options for realtime transcription

Show RealtimeOptions

audioSliceSec

number

default:"30"

Duration of each audio slice in seconds

audioMinSec

number

default:"1"

Minimum audio duration in seconds before transcription

maxSlicesInMemory

number

default:"3"

Maximum number of audio slices to keep in memory. Older slices are automatically released.

transcribeOptions

TranscribeOptions

Options to pass to whisper transcription (e.g., language, translate, etc.)

initialPrompt

string

Initial prompt text to guide transcription

promptPreviousSlices

boolean

default:"true"

Whether to include previous slice transcriptions as context in the prompt

audioOutputPath

string

Path to write audio to a WAV file (requires fs dependency)

audioStreamConfig

AudioStreamConfig

Audio stream configuration (sampleRate, channels, bitsPerSample, bufferSize, audioSource)

realtimeProcessingPauseMs

number

default:"200"

Minimum interval in milliseconds between realtime transcription updates

initRealtimeAfterMs

number

default:"200"

Wait time in milliseconds before starting the first realtime transcription

logger

(message: string) => void

Custom logger function. Defaults to no-op.

callbacks

RealtimeTranscriberCallbacks

Event callbacks for transcription events

Show RealtimeTranscriberCallbacks

onBeginTranscribe

(sliceInfo) => Promise<boolean>

Called before transcription starts. Return false to skip transcription for this slice.Receives: { audioData: Uint8Array, sliceIndex: number, duration: number, vadEvent?: RealtimeVadEvent }

onTranscribe

(event: RealtimeTranscribeEvent) => void

Called when transcription completes or starts

onBeginVad

(sliceInfo) => Promise<boolean>

Called before VAD processing. Return false to skip VAD for this audio chunk.Receives: { audioData: Uint8Array, sliceIndex: number, duration: number }

onVad

(event: RealtimeVadEvent) => void

Called when VAD detects speech events (start, continue, end, silence)

onError

(error: string) => void

Called when an error occurs

onStatusChange

(isActive: boolean) => void

Called when transcription status changes (started/stopped)

onStatsUpdate

(event: RealtimeStatsEvent) => void

Called when statistics update (memory usage, slice counts, etc.)

onSliceTranscriptionStabilized

(text: string) => void

Called when a final transcription for a slice is ready (after speech_end)

Methods

start()

Starts real-time transcription.

await transcriber.start(): Promise<void>

Throws an error if transcription is already active. Initializes the audio stream and begins processing.

stop()

Stops real-time transcription.

await transcriber.stop(): Promise<void>

Stops the audio stream, processes remaining queued transcriptions, waits for active transcriptions to complete, and releases resources.

updateCallbacks()

Updates event callbacks dynamically.

transcriber.updateCallbacks(callbacks: Partial<RealtimeTranscriberCallbacks>): void

callbacks

Partial<RealtimeTranscriberCallbacks>

required

Callbacks to update (merged with existing callbacks)

updateVadOptions()

Updates VAD options dynamically (if VAD context is available).

transcriber.updateVadOptions(options: Partial<VadOptions>): void

options

Partial<VadOptions>

required

VAD options to update (e.g., threshold, minSpeechDurationMs, etc.)

getStatistics()

Returns current statistics about the transcription session.

transcriber.getStatistics(): Statistics

isActive

boolean

Whether transcription is currently active

isTranscribing

boolean

Whether currently processing a transcription

vadEnabled

boolean

Whether VAD is enabled

audioStats

object

Show Audio statistics

isRecording

boolean

Whether audio stream is recording

accumulatedSamples

number

Total samples accumulated

vadStats

object | null

Show VAD statistics (null if VAD disabled)

enabled

boolean

Whether VAD is enabled

contextAvailable

boolean

Whether VAD context is available

lastSpeechDetectedTime

number

Timestamp of last detected speech

sliceStats

object

Slice manager statistics (see SliceManager.getCurrentSliceInfo())

getTranscriptionResults()

Returns all transcription results from completed slices.

transcriber.getTranscriptionResults(): Array<{
  slice: AudioSliceNoData,
  transcribeEvent: RealtimeTranscribeEvent
}>

Array

Array<object>

Array of transcription results

Show Result object

slice

AudioSliceNoData

Slice metadata (without audio data): index, sampleCount, startTime, endTime, isProcessed, isReleased

transcribeEvent

RealtimeTranscribeEvent

Transcription event with result data

nextSlice()

Forces move to the next audio slice, finalizing the current one regardless of capacity.

await transcriber.nextSlice(): Promise<void>

Useful for manually triggering transcription of the current audio buffer.

reset()

Resets all internal state (slices, queues, transcription results).

transcriber.reset(): void

This does not stop the transcription - use stop() first if needed.

release()

Releases all resources and cleans up.

await transcriber.release(): Promise<void>

Stops transcription if active and releases the audio stream and VAD context.

Event Types

RealtimeTranscribeEvent

type

'start' | 'transcribe' | 'end' | 'error'

Event type

sliceIndex

number

Index of the audio slice being transcribed

data

TranscribeResult

Transcription result (only for ‘transcribe’ type)

isCapturing

boolean

Whether audio stream is currently recording

processTime

number

Time taken to process transcription in milliseconds

recordingTime

number

Duration of the audio being transcribed in milliseconds

memoryUsage

MemoryUsage

Current memory usage statistics

vadEvent

RealtimeVadEvent

Associated VAD event if available

RealtimeVadEvent

type

'speech_start' | 'speech_end' | 'speech_continue' | 'silence'

VAD event type

timestamp

number

Event timestamp

lastSpeechDetectedTime

number

Timestamp when speech was last detected

confidence

number

VAD confidence score

duration

number

Duration of the audio segment in seconds

sliceIndex

number

Associated slice index

Example Usage

import { RealtimeTranscriber } from 'whisper.rn'
import { initWhisper } from 'whisper.rn'

const whisperContext = await initWhisper({
  filePath: 'ggml-base.en.bin'
})

const transcriber = new RealtimeTranscriber(
  {
    whisperContext,
    audioStream: myAudioStreamAdapter,
    vadContext: myVadContext, // Optional
  },
  {
    audioSliceSec: 30,
    maxSlicesInMemory: 3,
    initialPrompt: 'Technical discussion about AI',
  },
  {
    onTranscribe: (event) => {
      if (event.type === 'transcribe') {
        console.log('Transcription:', event.data?.result)
      }
    },
    onVad: (event) => {
      console.log('VAD event:', event.type, event.confidence)
    },
    onError: (error) => {
      console.error('Error:', error)
    },
  }
)

// Start transcription
await transcriber.start()

// Later: stop transcription
await transcriber.stop()

// Get all results
const results = transcriber.getTranscriptionResults()

// Release resources
await transcriber.release()

Core API

Voice Activity Detection

Realtime Transcription

Types & Interfaces

Utilities

Overview

Constructor

Parameters

Methods

start()

stop()

updateCallbacks()

updateVadOptions()

getStatistics()

getTranscriptionResults()

nextSlice()

reset()

release()

Event Types

RealtimeTranscribeEvent

RealtimeVadEvent

Example Usage

Build docs developers (and LLMs) love

Core API

Voice Activity Detection

Realtime Transcription

Types & Interfaces

Utilities

​Overview

​Constructor

​Parameters

​Methods

​start()

​stop()

​updateCallbacks()

​updateVadOptions()

​getStatistics()

​getTranscriptionResults()

​nextSlice()

​reset()

​release()

​Event Types

​RealtimeTranscribeEvent

​RealtimeVadEvent

​Example Usage

Build docs developers (and LLMs) love

Overview

Constructor

Parameters

Methods

start()

stop()

updateCallbacks()

updateVadOptions()

getStatistics()

getTranscriptionResults()

nextSlice()

reset()

release()

Event Types

RealtimeTranscribeEvent

RealtimeVadEvent

Example Usage