whisper.rn supports multiple audio input formats: file paths, bundled assets, URLs, base64-encoded data, and ArrayBuffers. This guide shows you how to work with each format.
whisper.rn accepts audio in the following formats:
File Paths Local file system paths (e.g., recorded audio)
Assets Bundled app assets via require()
URLs Remote audio files via HTTP/HTTPS
Base64 Base64-encoded WAV or PCM data
ArrayBuffer Raw PCM data via JSI (high performance)
File Paths
Transcribe audio files from the device file system:
import RNFS from 'react-native-fs' ;
import { initWhisper } from 'whisper.rn' ;
const context = await initWhisper ({
filePath: require ( '../assets/ggml-base.bin' ),
});
// Transcribe a local file
const audioPath = ` ${ RNFS . DocumentDirectoryPath } /recording.wav` ;
const { promise } = context . transcribe ( audioPath , {
language: 'en' ,
});
const { result } = await promise ;
console . log ( 'Result:' , result );
Bundled Assets
Use audio files bundled with your app:
Add to Metro Config
First, configure Metro to bundle audio files: const { getDefaultConfig } = require ( '@react-native/metro-config' );
module . exports = ( async () => {
const defaultConfig = await getDefaultConfig ( __dirname );
return {
... defaultConfig ,
resolver: {
... defaultConfig . resolver ,
assetExts: [
... defaultConfig . resolver . assetExts ,
'bin' , // For model files
'mil' , // For Core ML files
'wav' , // For audio files
'mp3' , // For MP3 files
],
},
};
})();
Require and Transcribe
Use require() to reference bundled assets: const sampleAudio = require ( '../assets/jfk.wav' );
const { promise } = context . transcribe ( sampleAudio , {
language: 'en' ,
});
const { result } = await promise ;
console . log ( 'Result:' , result );
The maximum asset size in React Native is 2GB. For larger models, download them at runtime instead.
Remote URLs
Transcribe audio files from URLs:
import RNFS from 'react-native-fs' ;
const audioUrl = 'https://example.com/audio.wav' ;
const localPath = ` ${ RNFS . DocumentDirectoryPath } /downloaded-audio.wav` ;
// Download the file first
await RNFS . downloadFile ({
fromUrl: audioUrl ,
toFile: localPath ,
progress : ( res ) => {
const progress = ( res . bytesWritten / res . contentLength ) * 100 ;
console . log ( `Download: ${ progress . toFixed ( 1 ) } %` );
},
}). promise ;
// Transcribe downloaded file
const { promise } = context . transcribe ( localPath , {
language: 'en' ,
});
const { result } = await promise ;
console . log ( 'Result:' , result );
Base64 WAV Data
Transcribe base64-encoded WAV files using transcribeData():
import { Buffer } from 'buffer' ;
import RNFS from 'react-native-fs' ;
// Read WAV file as base64
const wavFilePath = ` ${ RNFS . DocumentDirectoryPath } /recording.wav` ;
const base64Data = await RNFS . readFile ( wavFilePath , 'base64' );
// Transcribe base64 data
const { promise } = context . transcribeData ( base64Data , {
language: 'en' ,
onProgress : ( progress ) => {
console . log ( `Progress: ${ progress } %` );
},
});
const { result } = await promise ;
console . log ( 'Result:' , result );
Raw PCM Data
Transcribe raw PCM audio data (16kHz, mono, 16-bit):
Recording to PCM
import LiveAudioStream from '@fugood/react-native-audio-pcm-stream' ;
import { Buffer } from 'buffer' ;
const audioOptions = {
sampleRate: 16000 ,
channels: 1 ,
bitsPerSample: 16 ,
audioSource: 6 ,
bufferSize: 16 * 1024 ,
};
let recordedData : Uint8Array | null = null ;
// Initialize and start recording
LiveAudioStream . init ( audioOptions );
LiveAudioStream . on ( 'data' , ( data : string ) => {
const newData = new Uint8Array ( Buffer . from ( data , 'base64' ));
if ( ! recordedData ) {
recordedData = newData ;
} else {
// Concatenate audio chunks
const combined = new Uint8Array ( recordedData . length + newData . length );
combined . set ( recordedData );
combined . set ( newData , recordedData . length );
recordedData = combined ;
}
});
LiveAudioStream . start ();
// Later, stop recording
await LiveAudioStream . stop ();
if ( recordedData ) {
// Convert to base64 for transcription
const base64Data = Buffer . from ( recordedData ). toString ( 'base64' );
const { promise } = context . transcribeData ( base64Data , {
language: 'en' ,
});
const { result } = await promise ;
console . log ( 'Transcription:' , result );
}
Saving PCM as WAV
Use the WavFileWriter utility to save PCM data as WAV files:
import RNFS from 'react-native-fs' ;
import { WavFileWriter } from 'whisper.rn/utils/WavFileWriter' ;
const recordFilePath = ` ${ RNFS . DocumentDirectoryPath } /recording.wav` ;
const audioOptions = {
sampleRate: 16000 ,
channels: 1 ,
bitsPerSample: 16 ,
};
// Create WAV file writer
const wavWriter = new WavFileWriter ( RNFS , recordFilePath , audioOptions );
await wavWriter . initialize ();
// Append PCM data
await wavWriter . appendAudioData ( recordedData );
// Finalize WAV file
await wavWriter . finalize ();
console . log ( 'WAV file saved:' , recordFilePath );
// Now transcribe the WAV file
const { promise } = context . transcribe ( recordFilePath , {
language: 'en' ,
});
const { result } = await promise ;
console . log ( 'Result:' , result );
For maximum performance, use ArrayBuffer via JSI bindings:
import { Buffer } from 'buffer' ;
// Your PCM audio data
const pcmData = new Uint8Array ( /* ... */ );
// Convert to base64 (JSI handles conversion internally)
const base64Data = Buffer . from ( pcmData ). toString ( 'base64' );
// Use transcribeData - JSI optimizes ArrayBuffer transfers
const { promise } = context . transcribeData ( base64Data , {
language: 'en' ,
});
const { result } = await promise ;
console . log ( 'Result:' , result );
transcribeData() uses JSI bindings for efficient memory transfer, avoiding JSON serialization overhead.
For best results, ensure your audio meets these requirements:
Sample Rate 16kHz (required by Whisper model)
Channels Mono (1 channel) - stereo will be converted
Bit Depth 16-bit PCM (signed integer)
Format WAV for files, PCM for raw data
Complete Recording Example
Here’s a complete example showing recording and transcription:
Complete Recording Example
import React , { useCallback , useEffect , useRef , useState } from 'react' ;
import { View , Text , Button , ScrollView } from 'react-native' ;
import RNFS from 'react-native-fs' ;
import LiveAudioStream from '@fugood/react-native-audio-pcm-stream' ;
import { Buffer } from 'buffer' ;
import { initWhisper } from 'whisper.rn' ;
import type { WhisperContext } from 'whisper.rn' ;
import { WavFileWriter } from 'whisper.rn/utils/WavFileWriter' ;
const recordFile = ` ${ RNFS . DocumentDirectoryPath } /recording.wav` ;
const audioOptions = {
sampleRate: 16000 ,
channels: 1 ,
bitsPerSample: 16 ,
audioSource: 6 ,
wavFile: recordFile ,
bufferSize: 16 * 1024 ,
};
export default function RecordAndTranscribe () {
const contextRef = useRef < WhisperContext | null >( null );
const recordedDataRef = useRef < Uint8Array | null >( null );
const [ logs , setLogs ] = useState < string []>([]);
const [ result , setResult ] = useState < string | null >( null );
const [ isRecording , setIsRecording ] = useState ( false );
const log = useCallback (( ... messages : any []) => {
setLogs (( prev ) => [ ... prev , messages . join ( ' ' )]);
}, []);
useEffect (() => {
return () => {
contextRef . current ?. release ();
};
}, []);
const initialize = async () => {
log ( 'Initializing context...' );
const ctx = await initWhisper ({
filePath: require ( '../assets/ggml-base.bin' ),
});
contextRef . current = ctx ;
log ( 'Context initialized' );
};
const startRecording = async () => {
try {
recordedDataRef . current = null ;
LiveAudioStream . init ( audioOptions );
LiveAudioStream . on ( 'data' , ( data : string ) => {
const newData = new Uint8Array ( Buffer . from ( data , 'base64' ));
if ( ! recordedDataRef . current ) {
recordedDataRef . current = newData ;
} else {
const combined = new Uint8Array (
recordedDataRef . current . length + newData . length
);
combined . set ( recordedDataRef . current );
combined . set ( newData , recordedDataRef . current . length );
recordedDataRef . current = combined ;
}
});
LiveAudioStream . start ();
setIsRecording ( true );
log ( 'Recording started...' );
} catch ( error ) {
log ( 'Error starting recording:' , error );
}
};
const stopRecording = async () => {
try {
await LiveAudioStream . stop ();
setIsRecording ( false );
log ( 'Recording stopped' );
if ( ! recordedDataRef . current ) {
log ( 'No recorded data' );
return ;
}
if ( ! contextRef . current ) {
log ( 'Context not initialized' );
return ;
}
// Save as WAV file
const wavWriter = new WavFileWriter ( RNFS , recordFile , audioOptions );
await wavWriter . initialize ();
await wavWriter . appendAudioData ( recordedDataRef . current );
await wavWriter . finalize ();
log ( `Saved ${ recordedDataRef . current . length } bytes as WAV` );
// Transcribe using base64 data
const base64Data = Buffer . from ( recordedDataRef . current ). toString ( 'base64' );
log ( 'Starting transcription...' );
const startTime = Date . now ();
const { promise } = contextRef . current . transcribeData ( base64Data , {
language: 'en' ,
onProgress : ( progress ) => {
log ( `Progress: ${ progress } %` );
},
});
const { result } = await promise ;
const endTime = Date . now ();
setResult (
`Result: ${ result } \n ` +
`Transcribed in ${ endTime - startTime } ms`
);
log ( 'Transcription complete' );
} catch ( error ) {
log ( 'Error:' , error );
}
};
return (
< ScrollView style = {{ padding : 20 }} >
< Button title = "Initialize" onPress = { initialize } />
< View style = {{ marginTop : 10 }} >
< Button
title = {isRecording ? 'Stop Recording' : 'Start Recording' }
onPress = {isRecording ? stopRecording : startRecording }
disabled = {!contextRef. current }
/>
</ View >
< View style = {{ marginTop : 20 }} >
< Text > Logs : </ Text >
{ logs . map (( log , i ) => (
< Text key = { i } > { log } </ Text >
))}
</ View >
{ result && (
< View style = {{ marginTop : 20 }} >
< Text > Result :</ Text >
< Text >{ result }</ Text >
</ View >
)}
</ ScrollView >
);
}
Model File Handling
Model files can also be handled in different ways:
Bundled Asset
Downloaded
Core ML (iOS)
const context = await initWhisper ({
filePath: require ( '../assets/ggml-base.bin' ),
});
Pros : No download required
Cons : Increases app bundle sizeimport RNFS from 'react-native-fs' ;
const modelUrl = 'https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin' ;
const modelPath = ` ${ RNFS . DocumentDirectoryPath } /ggml-base.bin` ;
// Check if already downloaded
if ( ! ( await RNFS . exists ( modelPath ))) {
await RNFS . downloadFile ({
fromUrl: modelUrl ,
toFile: modelPath ,
}). promise ;
}
const context = await initWhisper ({ filePath: modelPath });
Pros : Smaller app bundle
Cons : Requires internet on first run// Place .mlmodelc directory next to model file:
// - ggml-tiny.en.bin
// - ggml-tiny.en-encoder.mlmodelc/
// - model.mil
// - coremldata.bin
// - weights/weight.bin
const context = await initWhisper ({
filePath: require ( '../assets/ggml-tiny.en.bin' ),
useCoreMLIos: true , // Enable Core ML acceleration
});
Pros : 2-3x faster on iOS
Cons : Larger file size, iOS only
Audio Format : Always use 16kHz mono audio. Converting from other formats adds processing overhead.
File Size : For base64 data, be aware that encoding increases size by ~33%. Use file paths when possible.
JSI Optimization : transcribeData() uses JSI bindings for efficient ArrayBuffer transfers without JSON serialization.
Model Caching : Download models once and cache them. Check if files exist before downloading.
Next Steps
Basic Transcription Learn basic transcription workflows
Realtime Streaming Implement live transcription
VAD Detection Detect speech in audio files
API Reference Full API documentation