Skip to main content
whisper.rn provides two primary methods for transcribing audio: transcribe() for file-based transcription and transcribeData() for raw audio data.

Method Signatures

transcribe()

Transcribe audio from a file path, base64-encoded WAV, or bundled asset.
whisperContext.transcribe(
  filePathOrBase64: string | number,
  options?: TranscribeFileOptions
): {
  stop: () => Promise<void>
  promise: Promise<TranscribeResult>
}
Parameters:
  • filePathOrBase64: File path, asset ID (require()), or base64 WAV with data:audio/wav;base64, prefix
  • options: Transcription options including callbacks

transcribeData()

Transcribe from base64-encoded PCM or ArrayBuffer (16-bit PCM, mono, 16kHz).
whisperContext.transcribeData(
  data: string | ArrayBuffer,
  options?: TranscribeFileOptions
): {
  stop: () => Promise<void>
  promise: Promise<TranscribeResult>
}
Parameters:
  • data: Base64-encoded float32 PCM string or ArrayBuffer (uses JSI for efficiency)
  • options: Same as transcribe()
When passing ArrayBuffer, JSI bindings must be installed (happens automatically via initWhisper()).

Basic Usage

import { initWhisper } from 'whisper.rn'

const whisperContext = await initWhisper({
  filePath: 'file:///path/to/ggml-base.bin'
})

const { stop, promise } = whisperContext.transcribe(
  'file:///path/to/audio.wav',
  { language: 'en' }
)

const { result, segments } = await promise
console.log('Transcription:', result)

Transcription Options

type TranscribeFileOptions = {
  // Language & Translation
  language?: string           // Default: 'auto' (auto-detect)
  translate?: boolean         // Translate to English (default: false)
  
  // Performance
  maxThreads?: number         // Default: 2 for 4-core, 4 for 8+ cores
  nProcessors?: number        // Parallel processing (default: 1)
  
  // Context & Prompting
  maxContext?: number         // Max context tokens
  prompt?: string             // Initial prompt to guide transcription
  
  // Timestamps & Segmentation
  tokenTimestamps?: boolean   // Enable token-level timestamps
  wordThold?: number          // Word timestamp probability threshold
  offset?: number             // Time offset in ms
  duration?: number           // Process duration in ms
  
  // Quality Settings
  temperature?: number        // Decoding temperature (default: 0.0)
  temperatureInc?: number     // Temperature increment on failure
  beamSize?: number           // Beam search size
  bestOf?: number             // Number of best candidates
  
  // Advanced
  maxLen?: number             // Max segment length in characters
  tdrzEnable?: boolean        // Enable tinydiarize (requires tdrz model)
  
  // Callbacks
  onProgress?: (progress: number) => void
  onNewSegments?: (result: TranscribeNewSegmentsResult) => void
}

Progress & New Segments

1

Track progress

Use onProgress to receive updates from 0-100%:
const { promise } = whisperContext.transcribe(
  audioPath,
  {
    language: 'en',
    onProgress: (progress) => {
      console.log(`Progress: ${progress}%`)
      // Update UI progress bar
    }
  }
)
2

Stream segments as they're decoded

Use onNewSegments to get results incrementally:
const { promise } = whisperContext.transcribe(
  audioPath,
  {
    language: 'en',
    onNewSegments: ({ nNew, totalNNew, result, segments }) => {
      console.log(`New segments: ${nNew}, Total: ${totalNNew}`)
      console.log(`Current result: ${result}`)
      // segments contains the new segments with timestamps
    }
  }
)
3

Stop transcription

Call stop() to abort ongoing transcription:
const { stop, promise } = whisperContext.transcribe(audioPath, options)

// Later, to cancel:
await stop()

const result = await promise
console.log('Aborted:', result.isAborted) // true

Transcription Result

type TranscribeResult = {
  result: string              // Full transcription text
  language: string            // Detected/specified language code
  segments: Array<{
    text: string              // Segment text
    t0: number                // Start time in ms
    t1: number                // End time in ms
  }>
  isAborted: boolean          // True if stopped via stop()
}
Example result:
{
  result: "And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.",
  language: "en",
  segments: [
    { text: "And so my fellow Americans,", t0: 0, t1: 2000 },
    { text: "ask not what your country can do for you,", t0: 2000, t1: 5000 },
    { text: "ask what you can do for your country.", t0: 5000, t1: 8000 }
  ],
  isAborted: false
}

Audio Format Requirements

Whisper requires audio in a specific format. Ensure your audio meets these requirements:
PropertyRequirement
Sample Rate16kHz
ChannelsMono (1 channel)
Bit Depth16-bit PCM
FormatWAV file or raw PCM data
For ArrayBuffer:
  • Use transcribeData() with raw 16-bit PCM samples
  • Data must be mono, 16kHz
  • JSI binding provides zero-copy transfer for best performance

Common Patterns

Auto-detect Language

const { promise } = whisperContext.transcribe(audioPath, {
  language: 'auto' // or omit the language option
})

const { result, language } = await promise
console.log(`Detected language: ${language}`)

Translate to English

const { promise } = whisperContext.transcribe(audioPath, {
  language: 'es',  // Spanish audio
  translate: true  // Translate to English
})

const { result } = await promise // English translation

Use Initial Prompt

Prompts help guide the model’s vocabulary and style. They don’t need to match the audio content.
const { promise } = whisperContext.transcribe(audioPath, {
  language: 'en',
  prompt: 'Medical terminology: diagnosis, treatment, prescription'
})

Process Audio Segment

const { promise } = whisperContext.transcribe(audioPath, {
  language: 'en',
  offset: 5000,      // Start at 5 seconds
  duration: 10000    // Process 10 seconds
})

Performance Tips

  1. Use ArrayBuffer for in-memory audio - JSI bindings avoid serialization overhead
  2. Adjust thread count - Default is optimal for most devices, but you can tune it
  3. Enable GPU/Core ML - Set useGpu: true or useCoreMLIos: true in initWhisper()
  4. Choose appropriate model size - Smaller models (tiny, base) are faster but less accurate
  5. Use quantized models - q8 or q5 models reduce memory and improve speed

Error Handling

try {
  const { promise } = whisperContext.transcribe(audioPath, options)
  const result = await promise
  
  if (result.isAborted) {
    console.log('Transcription was cancelled')
  } else {
    console.log('Success:', result.result)
  }
} catch (error) {
  console.error('Transcription failed:', error)
  // Handle errors (invalid file, unsupported format, etc.)
}

See Also

Build docs developers (and LLMs) love