Streaming TTS - React Native Sherpa-ONNX

Overview

Streaming TTS generates speech incrementally, emitting audio chunks as they are produced. This enables:

Lower latency: Start playing audio before generation completes
Real-time playback: Play while generating for interactive experiences
Progress tracking: Show generation progress to users
Memory efficiency: Process long texts without loading all audio into memory

Use streaming TTS when:

You need low time-to-first-audio
You’re building interactive voice assistants
You want to play audio while it’s being generated
You’re processing very long texts

Use batch TTS when:

You need the complete audio buffer
You’re saving to files
You need timestamps (use generateSpeechWithTimestamps)
Voice cloning with Zipvoice (streaming + voice cloning not supported for Zipvoice)

Quick Start

import { createStreamingTTS } from 'react-native-sherpa-onnx/tts';

// Create streaming TTS engine
const tts = await createStreamingTTS({
  modelPath: { type: 'asset', path: 'models/sherpa-onnx-vits-piper-en' },
  modelType: 'vits',
});

// Generate with streaming callbacks
const controller = await tts.generateSpeechStream(
  'Hello, this is streaming text-to-speech.',
  { sid: 0, speed: 1.0 },
  {
    onChunk: (chunk) => {
      console.log('Received chunk:', chunk.samples.length, 'samples');
      console.log('Progress:', (chunk.progress * 100).toFixed(1), '%');
      console.log('Is final:', chunk.isFinal);
      
      // Play chunk immediately
      playPcmSamples(chunk.samples, chunk.sampleRate);
    },
    onEnd: (event) => {
      if (event.cancelled) {
        console.log('Generation was cancelled');
      } else {
        console.log('Generation complete');
      }
    },
    onError: (event) => {
      console.error('TTS error:', event.message);
    },
  }
);

// Optional: cancel generation
// await controller.cancel();

// Clean up
await tts.destroy();

API Reference

createStreamingTTS(options)

Creates a streaming TTS engine.

src/tts/streaming.ts

export async function createStreamingTTS(
  options: TTSInitializeOptions | ModelPathConfig
): Promise<StreamingTtsEngine>;

Accepts the same options as createTTS(). See Text-to-Speech for details.

Streaming vs Batch Engines:

Use createStreamingTTS() for streaming generation (generateSpeechStream)
Use createTTS() for batch generation (generateSpeech, generateSpeechWithTimestamps)

They share the same native TTS instance but provide different JS interfaces.

StreamingTtsEngine: generateSpeechStream(text, options, handlers)

Starts streaming generation with chunk callbacks.

const controller = await tts.generateSpeechStream(
  text,
  options,
  handlers
);

Parameters:

text

string

required

Text to synthesize.

options

TtsGenerationOptions

Generation options (same as batch TTS):

sid: Speaker ID (default: 0)
speed: Speech speed multiplier (default: 1.0)
silenceScale: Silence scale
referenceAudio: Reference audio for voice cloning (Pocket; not supported for Zipvoice streaming)
referenceText: Transcript of reference audio
numSteps: Flow-matching steps
extra: Model-specific options

handlers

TtsStreamHandlers

required

Callbacks for chunks, completion, and errors:

onChunk?: (chunk: TtsStreamChunk) => void
onEnd?: (event: TtsStreamEnd) => void
onError?: (event: TtsStreamError) => void

Returns: Promise<TtsStreamController> - Controller to cancel or unsubscribe.

Only one stream per engine can be active at a time. Starting another stream before the first finishes will reject with TTS_STREAM_ERROR.

TtsStreamHandlers

Callbacks for streaming events.

onChunk(chunk)

Called for each generated audio chunk.

onChunk: (chunk) => {
  // chunk.samples: number[] - Float PCM in [-1, 1]
  // chunk.sampleRate: number - Sample rate in Hz
  // chunk.progress: number - Progress 0..1
  // chunk.isFinal: boolean - True for last chunk
  
  playPcmSamples(chunk.samples, chunk.sampleRate);
}

TtsStreamChunk:

interface TtsStreamChunk {
  instanceId?: string;  // Engine instance (for routing)
  requestId?: string;   // Request ID (for concurrent streams)
  samples: number[];    // Float PCM samples [-1, 1]
  sampleRate: number;   // Sample rate in Hz
  progress: number;     // Progress 0..1
  isFinal: boolean;     // True for last chunk
}

Keep onChunk lightweight. Forward audio to native playback quickly. Heavy processing can cause stuttering.

onEnd(event)

Called when generation finishes or is cancelled. Listeners are auto-removed after this.

onEnd: (event) => {
  if (event.cancelled) {
    console.log('User cancelled');
  } else {
    console.log('Generation complete');
  }
}

TtsStreamEnd:

interface TtsStreamEnd {
  instanceId?: string;
  requestId?: string;
  cancelled: boolean;  // True if cancelled
}

onError(event)

Called on generation errors. Listeners are auto-removed after this.

onError: (event) => {
  console.error('TTS error:', event.message);
}

TtsStreamError:

interface TtsStreamError {
  instanceId?: string;
  requestId?: string;
  message: string;
}

TtsStreamController

Returned by generateSpeechStream(). Use to cancel or unsubscribe.

interface TtsStreamController {
  cancel(): Promise<void>;      // Stop generation and unsubscribe
  unsubscribe(): void;          // Remove listeners only
}

Methods:

cancel(): Stops generation and removes event listeners
unsubscribe(): Removes event listeners only (call after completion if you didn’t wait for end/error)

Listeners are automatically removed when onEnd or onError is called. Call unsubscribe() manually only if you discard the controller early (e.g., navigation away).

StreamingTtsEngine: cancelSpeechStream()

Cancel the currently active stream.

await tts.cancelSpeechStream();

Native PCM Player

The SDK provides a native PCM player for low-latency audio playback.

startPcmPlayer(sampleRate, channels)

Start the native PCM player.

const sampleRate = await tts.getSampleRate();
await tts.startPcmPlayer(sampleRate, 1);  // Mono

writePcmChunk(samples)

Write PCM samples to the player. Call from onChunk.

onChunk: async (chunk) => {
  await tts.writePcmChunk(chunk.samples);
}

writePcmChunk() expects float PCM samples in [-1.0, 1.0]. Values outside this range will clip.

stopPcmPlayer()

Stop and release the PCM player.

await tts.stopPcmPlayer();

Complete Example: Streaming with Native Playback

import { createStreamingTTS } from 'react-native-sherpa-onnx/tts';

// Create engine
const tts = await createStreamingTTS({
  modelPath: { type: 'asset', path: 'models/vits-piper-en' },
  numThreads: 2,
});

// Start native player
const sampleRate = await tts.getSampleRate();
await tts.startPcmPlayer(sampleRate, 1);

// Accumulate chunks for optional save
const allChunks: number[] = [];

// Start streaming generation
const controller = await tts.generateSpeechStream(
  'This is a longer text that will be generated in chunks.',
  { speed: 1.0 },
  {
    onChunk: async (chunk) => {
      // Play immediately
      if (chunk.samples.length > 0) {
        await tts.writePcmChunk(chunk.samples);
      }
      
      // Optionally accumulate
      allChunks.push(...chunk.samples);
      
      // Update UI
      console.log('Progress:', (chunk.progress * 100).toFixed(1) + '%');
    },
    onEnd: async (event) => {
      await tts.stopPcmPlayer();
      
      if (!event.cancelled && allChunks.length > 0) {
        // Optionally save accumulated audio
        const audio = { samples: allChunks, sampleRate };
        await saveAudioToFile(audio, '/path/output.wav');
      }
    },
    onError: async (event) => {
      await tts.stopPcmPlayer();
      console.error('Error:', event.message);
    },
  }
);

// To cancel mid-generation:
// await controller.cancel();

// Later: clean up engine
await tts.destroy();

Voice Cloning with Streaming

Pocket TTS (Supported)

Pocket TTS supports streaming with voice cloning:

const tts = await createStreamingTTS({
  modelPath: { type: 'asset', path: 'models/pocket-tts' },
  modelType: 'pocket',
});

const refAudio = loadReferenceAudio();  // Your function

const controller = await tts.generateSpeechStream(
  'Target text to speak in reference voice.',
  {
    referenceAudio: {
      samples: refAudio.samples,
      sampleRate: 22050,
    },
    referenceText: 'Transcript of reference audio.',
    numSteps: 20,
    speed: 1.0,
    extra: {
      temperature: '0.7',
      chunk_size: '15',
    },
  },
  {
    onChunk: (chunk) => playPcmSamples(chunk.samples, chunk.sampleRate),
    onEnd: () => console.log('Done'),
    onError: (e) => console.error(e.message),
  }
);

Zipvoice (Not Supported)

Zipvoice does not support streaming with voice cloning. Use batch mode (createTTS() + generateSpeech()) for voice cloning with Zipvoice.

import { createTTS } from 'react-native-sherpa-onnx/tts';

const tts = await createTTS({
  modelPath: { type: 'asset', path: 'models/zipvoice-zh-en' },
  modelType: 'zipvoice',
});

const audio = await tts.generateSpeech('Text', {
  referenceAudio: { samples: refSamples, sampleRate: 24000 },
  referenceText: 'Transcript',
});

Multiple Requests

Only one stream can be active per engine at a time.

Sequential Requests

Wait for the previous stream to finish:

const tts = await createStreamingTTS({ /* ... */ });

// First request
await tts.generateSpeechStream('First text', undefined, handlers);
// Wait for onEnd callback

// Second request
await tts.generateSpeechStream('Second text', undefined, handlers);

Concurrent Requests (Multiple Engines)

Create multiple engines for concurrent streams:

const tts1 = await createStreamingTTS({ /* ... */ });
const tts2 = await createStreamingTTS({ /* ... */ });

// Both can run concurrently
const controller1 = await tts1.generateSpeechStream('Text 1', undefined, handlers1);
const controller2 = await tts2.generateSpeechStream('Text 2', undefined, handlers2);

// Events are tagged with instanceId and requestId for routing

Cancellation

Cancel via Controller

const controller = await tts.generateSpeechStream(text, undefined, handlers);

// User taps "Stop" button
await controller.cancel();  // Stops generation and unsubscribes

Cancel via Engine

await tts.cancelSpeechStream();

Recording Streamed Audio

Accumulate chunks to save the complete audio:

const chunks: number[] = [];
let sampleRate = 0;

const controller = await tts.generateSpeechStream(
  longText,
  { speed: 1.0 },
  {
    onChunk: (chunk) => {
      sampleRate = chunk.sampleRate;
      chunks.push(...chunk.samples);
      
      // Optionally play while recording
      playPcmSamples(chunk.samples, chunk.sampleRate);
    },
    onEnd: async () => {
      if (chunks.length > 0) {
        const audio = { samples: chunks, sampleRate };
        await saveAudioToFile(audio, '/path/output.wav');
      }
    },
    onError: () => {
      // Handle error
    },
  }
);

Memory Warning: Accumulating very long audio in JS can exhaust memory. For very long texts, consider:

Saving chunks incrementally to native storage
Splitting long texts into smaller segments
Using batch mode with file output

Performance Tips

Reduce Latency

Use native PCM player (avoid JS audio bridge overhead)
Keep onChunk lightweight (no heavy processing)
Increase numThreads for faster generation
Use hardware acceleration when available

const tts = await createStreamingTTS({
  modelPath: { type: 'asset', path: 'models/vits-piper-en' },
  numThreads: 4,
  provider: 'coreml',  // iOS: Core ML
});

Balance Chunk Size

The maxNumSentences option controls chunk size:

const tts = await createStreamingTTS({
  modelPath: { type: 'asset', path: 'models/vits-piper-en' },
  maxNumSentences: 2,  // Larger chunks = fewer callbacks
});

Smaller chunks (1 sentence): Lower latency, more callbacks
Larger chunks (2+ sentences): Higher latency, fewer callbacks

Avoid Memory Issues

Don’t accumulate all chunks for very long sessions
Use native-side streaming-to-file if possible
Split long texts into smaller generation requests

Common Use Cases

Voice Assistant

const tts = await createStreamingTTS({
  modelPath: { type: 'asset', path: 'models/vits-piper-en' },
});

const sampleRate = await tts.getSampleRate();
await tts.startPcmPlayer(sampleRate, 1);

async function speak(text: string) {
  const controller = await tts.generateSpeechStream(
    text,
    { speed: 1.0 },
    {
      onChunk: async (chunk) => {
        await tts.writePcmChunk(chunk.samples);
      },
      onEnd: () => {
        console.log('Finished speaking');
      },
      onError: (e) => {
        console.error('Speech error:', e.message);
      },
    }
  );
  
  return controller;  // Allow caller to cancel
}

// Use
const controller = await speak('Hello, how can I help you?');

// Cancel if needed
// await controller.cancel();

Progress Indicator

const [progress, setProgress] = useState(0);

const controller = await tts.generateSpeechStream(
  longText,
  undefined,
  {
    onChunk: (chunk) => {
      setProgress(chunk.progress * 100);
      playPcmSamples(chunk.samples, chunk.sampleRate);
    },
    onEnd: () => {
      setProgress(100);
    },
    onError: () => {
      setProgress(0);
    },
  }
);

// UI: <ProgressBar progress={progress} />

Text-to-Speech Button with Cancel

const [isSpeaking, setIsSpeaking] = useState(false);
const [controller, setController] = useState<TtsStreamController | null>(null);

async function handleSpeak() {
  if (isSpeaking && controller) {
    // Cancel
    await controller.cancel();
    await tts.stopPcmPlayer();
    setIsSpeaking(false);
    setController(null);
  } else {
    // Start
    setIsSpeaking(true);
    const sampleRate = await tts.getSampleRate();
    await tts.startPcmPlayer(sampleRate, 1);
    
    const ctrl = await tts.generateSpeechStream(
      text,
      { speed: 1.0 },
      {
        onChunk: async (chunk) => {
          await tts.writePcmChunk(chunk.samples);
        },
        onEnd: async () => {
          await tts.stopPcmPlayer();
          setIsSpeaking(false);
          setController(null);
        },
        onError: async () => {
          await tts.stopPcmPlayer();
          setIsSpeaking(false);
          setController(null);
        },
      }
    );
    
    setController(ctrl);
  }
}

// UI: <Button title={isSpeaking ? 'Stop' : 'Speak'} onPress={handleSpeak} />

Troubleshooting

Error: TTS_STREAM_ERROR (another stream active)

Only one stream per engine can be active. Wait for the previous stream to finish or cancel it:

await previousController.cancel();
// Now start new stream

Audio stuttering or choppy

Keep onChunk lightweight (avoid heavy processing)
Use native PCM player instead of JS audio APIs
Increase numThreads for faster generation
Reduce audio bridge overhead by writing larger chunks

High latency (slow time-to-first-audio)

Use hardware acceleration (provider: 'coreml' on iOS)
Increase numThreads
Reduce maxNumSentences for smaller chunks

Out of memory with long texts

Don’t accumulate all chunks in JS
Split long texts into smaller requests
Use batch mode for very long texts

Voice cloning not working in streaming

Pocket TTS: Voice cloning is supported in streaming
Zipvoice: Voice cloning is not supported in streaming; use batch mode (createTTS() + generateSpeech())

Next Steps

Text-to-Speech

Batch TTS generation and configuration

Model Setup

Learn how to bundle and load models

Speech-to-Text

Transcribe audio to text

Streaming STT

Real-time speech recognition

Get Started

Core Features

Advanced

Configuration

​Overview

​Quick Start

​API Reference

​createStreamingTTS(options)

​StreamingTtsEngine: generateSpeechStream(text, options, handlers)

​TtsStreamHandlers

​onChunk(chunk)

​onEnd(event)

​onError(event)

​TtsStreamController

​StreamingTtsEngine: cancelSpeechStream()

​Native PCM Player

​startPcmPlayer(sampleRate, channels)

​writePcmChunk(samples)

​stopPcmPlayer()

​Complete Example: Streaming with Native Playback

​Voice Cloning with Streaming

​Pocket TTS (Supported)

​Zipvoice (Not Supported)

​Multiple Requests

​Sequential Requests

​Concurrent Requests (Multiple Engines)

​Cancellation

​Cancel via Controller

​Cancel via Engine

​Recording Streamed Audio

​Performance Tips

​Reduce Latency

​Balance Chunk Size

​Avoid Memory Issues

​Common Use Cases

​Voice Assistant

​Progress Indicator

​Text-to-Speech Button with Cancel

​Troubleshooting

​Next Steps

Text-to-Speech

Model Setup

Speech-to-Text

Streaming STT

Build docs developers (and LLMs) love

Overview

Quick Start

API Reference

createStreamingTTS(options)

StreamingTtsEngine: generateSpeechStream(text, options, handlers)

TtsStreamHandlers

onChunk(chunk)

onEnd(event)

onError(event)

TtsStreamController

StreamingTtsEngine: cancelSpeechStream()

Native PCM Player

startPcmPlayer(sampleRate, channels)

writePcmChunk(samples)

stopPcmPlayer()

Complete Example: Streaming with Native Playback

Voice Cloning with Streaming

Pocket TTS (Supported)

Zipvoice (Not Supported)

Multiple Requests

Sequential Requests

Concurrent Requests (Multiple Engines)

Cancellation

Cancel via Controller

Cancel via Engine

Recording Streamed Audio

Performance Tips

Reduce Latency

Balance Chunk Size

Avoid Memory Issues

Common Use Cases

Voice Assistant

Progress Indicator

Text-to-Speech Button with Cancel

Troubleshooting

Next Steps