TextToSpeechModule - React Native ExecuTorch

Overview

TextToSpeechModule provides a class-based interface for Text-to-Speech (TTS) functionalities. It supports single-shot synthesis and streaming audio generation with models like Kokoro.

When to Use

Use TextToSpeechModule when:

You need manual control over TTS lifecycle
You’re working outside React components
You need streaming audio generation
You want to integrate speech synthesis into non-React code

Use useTextToSpeech hook when:

Building React components
You want automatic lifecycle management
You prefer declarative state management
You need React state integration

Constructor

new TextToSpeechModule()

Creates a new text-to-speech module instance.

Example

import { TextToSpeechModule } from 'react-native-executorch';

const tts = new TextToSpeechModule();

Methods

load()

async load(
  config: TextToSpeechConfig,
  onDownloadProgressCallback?: (progress: number) => void
): Promise<void>

Loads the TTS model and voice assets.

Parameters

config

TextToSpeechConfig

required

Configuration object containing:

model: Model configuration (e.g., { type: 'kokoro', durationPredictorSource, synthesizerSource })
voice: Voice configuration including language and voice data sources

onDownloadProgressCallback

(progress: number) => void

Optional callback to monitor download progress (value between 0 and 1).

Example

await tts.load(
  {
    model: {
      type: 'kokoro',
      durationPredictorSource: 'https://example.com/duration.pte',
      synthesizerSource: 'https://example.com/synthesizer.pte'
    },
    voice: {
      lang: 'en',
      voiceSource: 'https://example.com/voice_en.bin',
      extra: {
        taggerSource: 'https://example.com/tagger.bin',
        lexiconSource: 'https://example.com/lexicon.txt'
      }
    }
  },
  (progress) => {
    console.log(`Download: ${(progress * 100).toFixed(1)}%`);
  }
);

forward()

async forward(
  text: string,
  speed?: number
): Promise<Float32Array>

Synthesizes the provided text into speech audio.

Parameters

text

string

required

The input text to be synthesized.

speed

number

default:"1.0"

Optional speed multiplier for the speech synthesis. Values > 1.0 are faster, < 1.0 are slower.

Returns

A promise resolving to the synthesized audio waveform as a Float32Array.

Example

const audio = await tts.forward('Hello, how are you?', 1.0);
console.log('Audio samples:', audio.length);

// Play the audio (implementation depends on your audio library)
await playAudio(audio);

stream()

async *stream(input: TextToSpeechStreamingInput): AsyncGenerator<Float32Array>

Starts a streaming synthesis session. Yields audio chunks as they are generated.

Parameters

input

TextToSpeechStreamingInput

required

Input object containing:

text: The text to synthesize
speed: Optional speed multiplier (default: 1.0)

Returns

An async generator yielding Float32Array audio chunks.

Example

const audioChunks: Float32Array[] = [];

for await (const chunk of tts.stream({ text: 'Hello world', speed: 1.0 })) {
  console.log('Received chunk:', chunk.length, 'samples');
  audioChunks.push(chunk);
  
  // Or play chunk immediately for real-time playback
  await playAudioChunk(chunk);
}

console.log('Streaming complete, received', audioChunks.length, 'chunks');

streamStop()

streamStop(): void

Stops the streaming process if there is any ongoing.

Example

tts.streamStop();

delete()

delete(): void

Unloads the model from memory.

Example

tts.delete();

Complete Example: Single-shot Synthesis

import { TextToSpeechModule } from 'react-native-executorch';
import AudioPlayer from 'react-native-audio-player';

class VoiceSynthesizer {
  private tts: TextToSpeechModule;

  constructor() {
    this.tts = new TextToSpeechModule();
  }

  async initialize(language: string = 'en') {
    console.log(`Loading TTS model for ${language}...`);
    await this.tts.load(
      {
        model: {
          type: 'kokoro',
          durationPredictorSource: `https://example.com/duration_${language}.pte`,
          synthesizerSource: `https://example.com/synthesizer_${language}.pte`
        },
        voice: {
          lang: language,
          voiceSource: `https://example.com/voice_${language}.bin`,
          extra: {
            taggerSource: `https://example.com/tagger_${language}.bin`,
            lexiconSource: `https://example.com/lexicon_${language}.txt`
          }
        }
      },
      (progress) => {
        console.log(`Loading: ${(progress * 100).toFixed(0)}%`);
      }
    );
    console.log('TTS ready!');
  }

  async speak(text: string, speed: number = 1.0) {
    console.log(`Synthesizing: "${text}"`);
    const audio = await this.tts.forward(text, speed);
    console.log(`Generated ${audio.length} audio samples`);
    
    // Play the audio
    await AudioPlayer.play(audio);
  }

  cleanup() {
    this.tts.delete();
  }
}

// Usage
const synthesizer = new VoiceSynthesizer();
await synthesizer.initialize('en');

await synthesizer.speak('Hello, welcome to text to speech!', 1.0);
await synthesizer.speak('This is faster speech.', 1.5);
await synthesizer.speak('This is slower speech.', 0.8);

synthesizer.cleanup();

Complete Example: Streaming Synthesis

import { TextToSpeechModule } from 'react-native-executorch';

class StreamingVoiceSynthesizer {
  private tts: TextToSpeechModule;
  private audioQueue: Float32Array[] = [];

  constructor() {
    this.tts = new TextToSpeechModule();
  }

  async initialize() {
    await this.tts.load({
      model: {
        type: 'kokoro',
        durationPredictorSource: 'https://example.com/duration.pte',
        synthesizerSource: 'https://example.com/synthesizer.pte'
      },
      voice: {
        lang: 'en',
        voiceSource: 'https://example.com/voice.bin',
        extra: {
          taggerSource: 'https://example.com/tagger.bin',
          lexiconSource: 'https://example.com/lexicon.txt'
        }
      }
    });
  }

  async streamSpeak(
    text: string,
    onChunk: (chunk: Float32Array) => void,
    speed: number = 1.0
  ) {
    console.log(`Streaming synthesis for: "${text}"`);
    
    for await (const chunk of this.tts.stream({ text, speed })) {
      console.log(`Received audio chunk: ${chunk.length} samples`);
      onChunk(chunk);
    }
    
    console.log('Streaming complete');
  }

  stop() {
    this.tts.streamStop();
  }

  cleanup() {
    this.tts.delete();
  }
}

// Usage
const streamingSynth = new StreamingVoiceSynthesizer();
await streamingSynth.initialize();

// Stream with real-time playback
await streamingSynth.streamSpeak(
  'This is a long sentence that will be synthesized in chunks.',
  (chunk) => {
    // Play chunk immediately for low-latency playback
    playAudioChunk(chunk);
  },
  1.0
);

streamingSynth.cleanup();

Multi-Language Support

class MultiLanguageTTS {
  private ttsModules: Map<string, TextToSpeechModule> = new Map();

  async loadLanguage(lang: string) {
    const tts = new TextToSpeechModule();
    await tts.load({
      model: {
        type: 'kokoro',
        durationPredictorSource: `https://example.com/duration_${lang}.pte`,
        synthesizerSource: `https://example.com/synthesizer_${lang}.pte`
      },
      voice: {
        lang,
        voiceSource: `https://example.com/voice_${lang}.bin`,
        extra: {
          taggerSource: `https://example.com/tagger_${lang}.bin`,
          lexiconSource: `https://example.com/lexicon_${lang}.txt`
        }
      }
    });
    this.ttsModules.set(lang, tts);
    console.log(`Loaded ${lang} TTS`);
  }

  async speak(text: string, lang: string, speed: number = 1.0) {
    const tts = this.ttsModules.get(lang);
    if (!tts) {
      throw new Error(`Language ${lang} not loaded`);
    }
    return await tts.forward(text, speed);
  }

  cleanupAll() {
    this.ttsModules.forEach(tts => tts.delete());
    this.ttsModules.clear();
  }
}

// Usage
const multiTTS = new MultiLanguageTTS();

// Load multiple languages
await multiTTS.loadLanguage('en');
await multiTTS.loadLanguage('es');
await multiTTS.loadLanguage('fr');

// Speak in different languages
const englishAudio = await multiTTS.speak('Hello world', 'en');
const spanishAudio = await multiTTS.speak('Hola mundo', 'es');
const frenchAudio = await multiTTS.speak('Bonjour le monde', 'fr');

multiTTS.cleanupAll();

Speed Control Examples

// Normal speed
await tts.forward('Normal speed speech', 1.0);

// Fast speech (1.5x)
await tts.forward('Fast speech', 1.5);

// Slow speech (0.75x)
await tts.forward('Slow speech', 0.75);

// Very fast (2x)
await tts.forward('Very fast speech', 2.0);

// Very slow (0.5x)
await tts.forward('Very slow speech', 0.5);

Batch Synthesis

class BatchTTS {
  private tts: TextToSpeechModule;

  constructor() {
    this.tts = new TextToSpeechModule();
  }

  async initialize() {
    await this.tts.load(/* config */);
  }

  async synthesizeMultiple(texts: string[]): Promise<Float32Array[]> {
    const results: Float32Array[] = [];
    
    for (const text of texts) {
      console.log(`Synthesizing: "${text}"`);
      const audio = await this.tts.forward(text);
      results.push(audio);
    }
    
    return results;
  }

  cleanup() {
    this.tts.delete();
  }
}

// Usage
const batchTTS = new BatchTTS();
await batchTTS.initialize();

const sentences = [
  'First sentence.',
  'Second sentence.',
  'Third sentence.'
];

const audioFiles = await batchTTS.synthesizeMultiple(sentences);
console.log(`Generated ${audioFiles.length} audio files`);

batchTTS.cleanup();

Audio Format

The synthesized audio is returned as:

Format: Float32Array
Sample rate: 24kHz (24,000 Hz) for Kokoro
Channels: Mono (single channel)
Values: Normalized float values (-1.0 to 1.0)

Supported Models

Currently supports:

Kokoro: High-quality neural TTS with multiple language support

Performance Considerations

Synthesis is relatively fast (typically < 1 second for short sentences)
Streaming mode provides lower latency for long texts
Speed parameter doesn’t significantly affect generation time
Always call delete() when done to free resources
Consider caching synthesized audio for repeated phrases

Initialization

LLM Hooks

Computer Vision Hooks

Speech Hooks

Text Embeddings Hooks

General Hooks

Modules

Types

Constants

Errors

​Overview

​When to Use

​Constructor

​Example

​Methods

​load()

​Parameters

​Example

​forward()

​Parameters

​Returns

​Example

​stream()

​Parameters

​Returns

​Example

​streamStop()

​Example

​delete()

​Example

​Complete Example: Single-shot Synthesis

​Complete Example: Streaming Synthesis

​Multi-Language Support

​Speed Control Examples

​Batch Synthesis

​Audio Format

​Supported Models

​Performance Considerations

​See Also

Build docs developers (and LLMs) love

Overview

When to Use

Constructor

Example

Methods

load()

Parameters

Example

forward()

Parameters

Returns

Example

stream()

Parameters

Returns

Example

streamStop()

Example

delete()

Example

Complete Example: Single-shot Synthesis

Complete Example: Streaming Synthesis

Multi-Language Support

Speed Control Examples

Batch Synthesis

Audio Format

Supported Models

Performance Considerations

See Also