Overview
TextToSpeechModule provides a class-based interface for Text-to-Speech (TTS) functionalities. It supports single-shot synthesis and streaming audio generation with models like Kokoro.
When to Use
Use TextToSpeechModule when:
- You need manual control over TTS lifecycle
- You’re working outside React components
- You need streaming audio generation
- You want to integrate speech synthesis into non-React code
Use useTextToSpeech hook when:
- Building React components
- You want automatic lifecycle management
- You prefer declarative state management
- You need React state integration
Constructor
Creates a new text-to-speech module instance.
Example
import { TextToSpeechModule } from 'react-native-executorch';
const tts = new TextToSpeechModule();
Methods
load()
async load(
config: TextToSpeechConfig,
onDownloadProgressCallback?: (progress: number) => void
): Promise<void>
Loads the TTS model and voice assets.
Parameters
config
TextToSpeechConfig
required
Configuration object containing:
model: Model configuration (e.g., { type: 'kokoro', durationPredictorSource, synthesizerSource })
voice: Voice configuration including language and voice data sources
onDownloadProgressCallback
(progress: number) => void
Optional callback to monitor download progress (value between 0 and 1).
Example
await tts.load(
{
model: {
type: 'kokoro',
durationPredictorSource: 'https://example.com/duration.pte',
synthesizerSource: 'https://example.com/synthesizer.pte'
},
voice: {
lang: 'en',
voiceSource: 'https://example.com/voice_en.bin',
extra: {
taggerSource: 'https://example.com/tagger.bin',
lexiconSource: 'https://example.com/lexicon.txt'
}
}
},
(progress) => {
console.log(`Download: ${(progress * 100).toFixed(1)}%`);
}
);
forward()
async forward(
text: string,
speed?: number
): Promise<Float32Array>
Synthesizes the provided text into speech audio.
Parameters
The input text to be synthesized.
Optional speed multiplier for the speech synthesis. Values > 1.0 are faster, < 1.0 are slower.
Returns
A promise resolving to the synthesized audio waveform as a Float32Array.
Example
const audio = await tts.forward('Hello, how are you?', 1.0);
console.log('Audio samples:', audio.length);
// Play the audio (implementation depends on your audio library)
await playAudio(audio);
stream()
async *stream(input: TextToSpeechStreamingInput): AsyncGenerator<Float32Array>
Starts a streaming synthesis session. Yields audio chunks as they are generated.
Parameters
input
TextToSpeechStreamingInput
required
Input object containing:
text: The text to synthesize
speed: Optional speed multiplier (default: 1.0)
Returns
An async generator yielding Float32Array audio chunks.
Example
const audioChunks: Float32Array[] = [];
for await (const chunk of tts.stream({ text: 'Hello world', speed: 1.0 })) {
console.log('Received chunk:', chunk.length, 'samples');
audioChunks.push(chunk);
// Or play chunk immediately for real-time playback
await playAudioChunk(chunk);
}
console.log('Streaming complete, received', audioChunks.length, 'chunks');
streamStop()
Stops the streaming process if there is any ongoing.
Example
delete()
Unloads the model from memory.
Example
Complete Example: Single-shot Synthesis
import { TextToSpeechModule } from 'react-native-executorch';
import AudioPlayer from 'react-native-audio-player';
class VoiceSynthesizer {
private tts: TextToSpeechModule;
constructor() {
this.tts = new TextToSpeechModule();
}
async initialize(language: string = 'en') {
console.log(`Loading TTS model for ${language}...`);
await this.tts.load(
{
model: {
type: 'kokoro',
durationPredictorSource: `https://example.com/duration_${language}.pte`,
synthesizerSource: `https://example.com/synthesizer_${language}.pte`
},
voice: {
lang: language,
voiceSource: `https://example.com/voice_${language}.bin`,
extra: {
taggerSource: `https://example.com/tagger_${language}.bin`,
lexiconSource: `https://example.com/lexicon_${language}.txt`
}
}
},
(progress) => {
console.log(`Loading: ${(progress * 100).toFixed(0)}%`);
}
);
console.log('TTS ready!');
}
async speak(text: string, speed: number = 1.0) {
console.log(`Synthesizing: "${text}"`);
const audio = await this.tts.forward(text, speed);
console.log(`Generated ${audio.length} audio samples`);
// Play the audio
await AudioPlayer.play(audio);
}
cleanup() {
this.tts.delete();
}
}
// Usage
const synthesizer = new VoiceSynthesizer();
await synthesizer.initialize('en');
await synthesizer.speak('Hello, welcome to text to speech!', 1.0);
await synthesizer.speak('This is faster speech.', 1.5);
await synthesizer.speak('This is slower speech.', 0.8);
synthesizer.cleanup();
Complete Example: Streaming Synthesis
import { TextToSpeechModule } from 'react-native-executorch';
class StreamingVoiceSynthesizer {
private tts: TextToSpeechModule;
private audioQueue: Float32Array[] = [];
constructor() {
this.tts = new TextToSpeechModule();
}
async initialize() {
await this.tts.load({
model: {
type: 'kokoro',
durationPredictorSource: 'https://example.com/duration.pte',
synthesizerSource: 'https://example.com/synthesizer.pte'
},
voice: {
lang: 'en',
voiceSource: 'https://example.com/voice.bin',
extra: {
taggerSource: 'https://example.com/tagger.bin',
lexiconSource: 'https://example.com/lexicon.txt'
}
}
});
}
async streamSpeak(
text: string,
onChunk: (chunk: Float32Array) => void,
speed: number = 1.0
) {
console.log(`Streaming synthesis for: "${text}"`);
for await (const chunk of this.tts.stream({ text, speed })) {
console.log(`Received audio chunk: ${chunk.length} samples`);
onChunk(chunk);
}
console.log('Streaming complete');
}
stop() {
this.tts.streamStop();
}
cleanup() {
this.tts.delete();
}
}
// Usage
const streamingSynth = new StreamingVoiceSynthesizer();
await streamingSynth.initialize();
// Stream with real-time playback
await streamingSynth.streamSpeak(
'This is a long sentence that will be synthesized in chunks.',
(chunk) => {
// Play chunk immediately for low-latency playback
playAudioChunk(chunk);
},
1.0
);
streamingSynth.cleanup();
Multi-Language Support
class MultiLanguageTTS {
private ttsModules: Map<string, TextToSpeechModule> = new Map();
async loadLanguage(lang: string) {
const tts = new TextToSpeechModule();
await tts.load({
model: {
type: 'kokoro',
durationPredictorSource: `https://example.com/duration_${lang}.pte`,
synthesizerSource: `https://example.com/synthesizer_${lang}.pte`
},
voice: {
lang,
voiceSource: `https://example.com/voice_${lang}.bin`,
extra: {
taggerSource: `https://example.com/tagger_${lang}.bin`,
lexiconSource: `https://example.com/lexicon_${lang}.txt`
}
}
});
this.ttsModules.set(lang, tts);
console.log(`Loaded ${lang} TTS`);
}
async speak(text: string, lang: string, speed: number = 1.0) {
const tts = this.ttsModules.get(lang);
if (!tts) {
throw new Error(`Language ${lang} not loaded`);
}
return await tts.forward(text, speed);
}
cleanupAll() {
this.ttsModules.forEach(tts => tts.delete());
this.ttsModules.clear();
}
}
// Usage
const multiTTS = new MultiLanguageTTS();
// Load multiple languages
await multiTTS.loadLanguage('en');
await multiTTS.loadLanguage('es');
await multiTTS.loadLanguage('fr');
// Speak in different languages
const englishAudio = await multiTTS.speak('Hello world', 'en');
const spanishAudio = await multiTTS.speak('Hola mundo', 'es');
const frenchAudio = await multiTTS.speak('Bonjour le monde', 'fr');
multiTTS.cleanupAll();
Speed Control Examples
// Normal speed
await tts.forward('Normal speed speech', 1.0);
// Fast speech (1.5x)
await tts.forward('Fast speech', 1.5);
// Slow speech (0.75x)
await tts.forward('Slow speech', 0.75);
// Very fast (2x)
await tts.forward('Very fast speech', 2.0);
// Very slow (0.5x)
await tts.forward('Very slow speech', 0.5);
Batch Synthesis
class BatchTTS {
private tts: TextToSpeechModule;
constructor() {
this.tts = new TextToSpeechModule();
}
async initialize() {
await this.tts.load(/* config */);
}
async synthesizeMultiple(texts: string[]): Promise<Float32Array[]> {
const results: Float32Array[] = [];
for (const text of texts) {
console.log(`Synthesizing: "${text}"`);
const audio = await this.tts.forward(text);
results.push(audio);
}
return results;
}
cleanup() {
this.tts.delete();
}
}
// Usage
const batchTTS = new BatchTTS();
await batchTTS.initialize();
const sentences = [
'First sentence.',
'Second sentence.',
'Third sentence.'
];
const audioFiles = await batchTTS.synthesizeMultiple(sentences);
console.log(`Generated ${audioFiles.length} audio files`);
batchTTS.cleanup();
The synthesized audio is returned as:
- Format: Float32Array
- Sample rate: 24kHz (24,000 Hz) for Kokoro
- Channels: Mono (single channel)
- Values: Normalized float values (-1.0 to 1.0)
Supported Models
Currently supports:
- Kokoro: High-quality neural TTS with multiple language support
- Synthesis is relatively fast (typically < 1 second for short sentences)
- Streaming mode provides lower latency for long texts
- Speed parameter doesn’t significantly affect generation time
- Always call
delete() when done to free resources
- Consider caching synthesized audio for repeated phrases
See Also