The useTextToSpeech hook converts text into natural-sounding speech using the Kokoro TTS model. It supports both complete audio generation and streaming playback for real-time applications.
Basic Usage
import { useTextToSpeech } from 'react-native-executorch' ;
function TextReader () {
const { forward , isReady , error } = useTextToSpeech ({
model: {
type: 'kokoro' ,
durationPredictorSource: require ( './models/duration-predictor.pte' ),
synthesizerSource: require ( './models/synthesizer.pte' ),
},
voice: {
lang: 'en-us' ,
voiceSource: require ( './voices/en-us-voice.bin' ),
extra: {
taggerSource: require ( './models/tagger.pte' ),
lexiconSource: require ( './models/lexicon.bin' ),
},
},
});
const speak = async () => {
if ( ! isReady ) return ;
const audio = await forward ({
text: 'Hello, this is a text to speech demo.' ,
speed: 1.0 ,
});
// Play the audio using your audio player
console . log ( 'Generated audio samples:' , audio . length );
};
return (
< View >
{ error && < Text > Error : { error . message }</ Text >}
< Button onPress = { speak } title = "Speak" disabled = {! isReady } />
</ View >
);
}
Hook Signature
useTextToSpeech(props)
function useTextToSpeech ( props : TextToSpeechProps ) : TextToSpeechType ;
Parameters
Kokoro TTS model configuration. Model type identifier. Currently only 'kokoro' is supported.
Location of the duration predictor .pte file. Can be a URL (string), local file (require), or resource ID (number).
Location of the synthesizer .pte file. Can be a URL (string), local file (require), or resource ID (number).
Voice configuration including language and embeddings. lang
TextToSpeechLanguage
required
Speaker’s language. Currently supports 'en-us' (American English) or 'en-gb' (British English).
Location of the voice embedding binary file.
Additional Kokoro-specific voice resources. Location of the phoneme tagger model binary.
Location of the pronunciation lexicon binary.
Prevent automatic model loading on mount. Useful for lazy loading scenarios.
Returns
Contains error details if model loading or generation fails.
Indicates whether the model has loaded successfully and is ready for synthesis.
Indicates whether audio generation is currently in progress.
Download progress as a value between 0 and 1.
forward
(input: TextToSpeechInput) => Promise<Float32Array>
Generate complete audio for the given text in a single pass. Returns 22kHz mono audio.
stream
(input: TextToSpeechStreamingInput) => Promise<void>
Generate audio incrementally with callbacks for real-time playback. Optimal for long text.
Stop the current streaming generation process.
Generation Methods
Complete Audio Generation
Generate the entire audio at once:
const { forward , isReady } = useTextToSpeech ({ model , voice });
const audio = await forward ({
text: 'Welcome to React Native ExecuTorch.' ,
speed: 1.0 , // Normal speed
});
// audio is Float32Array with 22kHz mono samples
console . log ( 'Sample rate: 22050 Hz' );
console . log ( 'Duration:' , audio . length / 22050 , 'seconds' );
Streaming Audio Generation
Generate and play audio incrementally:
const { stream , isReady } = useTextToSpeech ({ model , voice });
await stream ({
text: 'This is a longer text that will be synthesized in chunks.' ,
speed: 1.2 , // 20% faster
onBegin : async () => {
console . log ( 'Starting audio generation...' );
// Initialize audio player
},
onNext : async ( audioChunk : Float32Array ) => {
console . log ( 'Received chunk:' , audioChunk . length , 'samples' );
// Play chunk immediately
await audioPlayer . playChunk ( audioChunk );
},
onEnd : async () => {
console . log ( 'Audio generation complete' );
// Cleanup
},
});
Types
TextToSpeechInput
Input for audio generation:
interface TextToSpeechInput {
text : string ; // Text to synthesize
speed ?: number ; // Speed multiplier (default: 1.0)
}
TextToSpeechStreamingInput
Input for streaming generation with lifecycle callbacks:
interface TextToSpeechStreamingInput extends TextToSpeechInput {
onBegin ?: () => void | Promise < void >; // Called when generation starts
onNext ?: ( audio : Float32Array ) => void | Promise < void >; // Called for each chunk
onEnd ?: () => void | Promise < void >; // Called when generation completes
}
TextToSpeechLanguage
Supported language codes:
type TextToSpeechLanguage =
| 'en-us' // American English
| 'en-gb' ; // British English
VoiceConfig
Voice configuration structure:
interface VoiceConfig {
lang : TextToSpeechLanguage ;
voiceSource : ResourceSource ;
extra ?: KokoroVoiceExtras ;
}
Kokoro-specific voice resources:
interface KokoroVoiceExtras {
taggerSource : ResourceSource ; // Phoneme tagger model
lexiconSource : ResourceSource ; // Pronunciation lexicon
}
KokoroConfig
Kokoro TTS model configuration:
interface KokoroConfig {
type : 'kokoro' ;
durationPredictorSource : ResourceSource ;
synthesizerSource : ResourceSource ;
}
The generated audio has the following characteristics:
Sample rate : 22,050 Hz (22kHz)
Channels : Mono (single channel)
Data type : Float32Array
Value range : -1.0 to 1.0 (normalized)
Buffer layout : Contiguous samples in time order
Playing Generated Audio
Example using a typical audio player:
import { Audio } from 'expo-av' ;
const { forward } = useTextToSpeech ({ model , voice });
const speakText = async ( text : string ) => {
// Generate audio
const audioData = await forward ({ text , speed: 1.0 });
// Convert Float32Array to format suitable for your audio player
const audioBuffer = convertToAudioBuffer ( audioData , 22050 );
// Play audio
const sound = new Audio . Sound ();
await sound . loadAsync ({ uri: audioBuffer });
await sound . playAsync ();
};
Advanced Usage
Speed Control
Adjust speech rate for different contexts:
// Slower speech for clarity (0.8x speed)
await forward ({ text: 'Important instructions here.' , speed: 0.8 });
// Normal speed (1.0x)
await forward ({ text: 'Regular conversation.' , speed: 1.0 });
// Faster speech for quick playback (1.5x speed)
await forward ({ text: 'Quick summary.' , speed: 1.5 });
Streaming with Progress Tracking
function TTSWithProgress () {
const [ progress , setProgress ] = useState ( 0 );
const [ totalChunks , setTotalChunks ] = useState ( 0 );
const { stream } = useTextToSpeech ({ model , voice });
const speakWithTracking = async ( text : string ) => {
let chunkCount = 0 ;
await stream ({
text ,
onBegin : async () => {
setProgress ( 0 );
setTotalChunks ( 0 );
},
onNext : async ( audioChunk ) => {
chunkCount ++ ;
setTotalChunks ( chunkCount );
setProgress (( prev ) => prev + audioChunk . length );
// Play chunk
await playAudioChunk ( audioChunk );
},
onEnd : async () => {
console . log ( `Completed ${ chunkCount } chunks` );
},
});
};
return (
< View >
< Text > Chunks : { totalChunks }</ Text >
< Text > Samples : { progress }</ Text >
</ View >
);
}
Multiple Voices
Switch between different voice configurations:
const americanVoice : VoiceConfig = {
lang: 'en-us' ,
voiceSource: require ( './voices/en-us-male.bin' ),
extra: {
taggerSource: require ( './models/tagger.pte' ),
lexiconSource: require ( './models/en-us-lexicon.bin' ),
},
};
const britishVoice : VoiceConfig = {
lang: 'en-gb' ,
voiceSource: require ( './voices/en-gb-female.bin' ),
extra: {
taggerSource: require ( './models/tagger.pte' ),
lexiconSource: require ( './models/en-gb-lexicon.bin' ),
},
};
// Use different hooks for different voices
const american = useTextToSpeech ({ model , voice: americanVoice });
const british = useTextToSpeech ({ model , voice: britishVoice });
Interrupting Playback
const { stream , streamStop } = useTextToSpeech ({ model , voice });
// Start streaming
const speakPromise = stream ({
text: 'This is a very long text that will take time to synthesize...' ,
onNext : async ( chunk ) => {
await playAudioChunk ( chunk );
},
});
// Stop mid-stream
const handleStop = () => {
streamStop (); // Interrupts generation
stopAudioPlayback (); // Stop playing audio
};
Error Handling
const { forward , error , isReady } = useTextToSpeech ({ model , voice });
if ( error ) {
console . error ( 'TTS Error:' , error . message );
// Handle specific error codes
}
try {
const audio = await forward ({ text: 'Hello world' });
} catch ( err ) {
if ( err . code === 'MODULE_NOT_LOADED' ) {
console . error ( 'Model not ready yet' );
} else if ( err . code === 'MODEL_GENERATING' ) {
console . error ( 'Already generating audio' );
} else {
console . error ( 'Generation failed:' , err . message );
}
}
Best Practices
Text Length : For long text, use streaming mode to start playback sooner and reduce memory usage.
Speed Range : Keep speed between 0.5 and 2.0 for natural-sounding speech. Extreme values may degrade quality.
Memory Management : Clear audio buffers after playback to free memory, especially for long content.
Error Recovery : Always check isReady before calling forward() or stream().
Concurrent Requests : The hook prevents concurrent generation. Wait for completion or use streamStop() before starting new generation.
Text Preprocessing : Clean up text (remove special characters, normalize numbers) for better pronunciation.
Resource Caching : Models and voices are cached after first download. Reuse the same sources to avoid re-downloading.
Streaming vs. Complete : Use streaming for text longer than a few sentences to reduce perceived latency.
Chunk Processing : Process audio chunks asynchronously to maintain smooth playback.
Preload Models : Set preventLoad: false (default) to load models on component mount.
Voice Selection : Choose appropriate voice embeddings for your use case (male/female, accent, etc.).
Common Use Cases
Audio Book Reader
function AudioBookReader ({ chapters } : { chapters : string [] }) {
const { stream , isReady } = useTextToSpeech ({ model , voice });
const [ currentChapter , setCurrentChapter ] = useState ( 0 );
const readChapter = async ( chapterText : string ) => {
await stream ({
text: chapterText ,
speed: 1.1 , // Slightly faster for continuous listening
onNext : async ( chunk ) => {
await playAudioChunk ( chunk );
},
onEnd : async () => {
// Auto-advance to next chapter
if ( currentChapter < chapters . length - 1 ) {
setCurrentChapter (( prev ) => prev + 1 );
}
},
});
};
return < AudioPlayer onPlay ={() => readChapter ( chapters [ currentChapter ])} />;
}
Accessibility Screen Reader
function ScreenReader ({ content } : { content : string }) {
const { forward , isReady } = useTextToSpeech ({ model , voice });
const speak = async () => {
const audio = await forward ({
text: content ,
speed: 1.0 ,
});
await playAudio ( audio );
};
return (
< TouchableOpacity onPress = { speak } disabled = {! isReady } >
< Text >{ content } </ Text >
</ TouchableOpacity >
);
}