Skip to main content

Overview

Streaming TTS generates speech incrementally, emitting audio chunks as they are produced. This enables:
  • Lower latency: Start playing audio before generation completes
  • Real-time playback: Play while generating for interactive experiences
  • Progress tracking: Show generation progress to users
  • Memory efficiency: Process long texts without loading all audio into memory
Use streaming TTS when:
  • You need low time-to-first-audio
  • You’re building interactive voice assistants
  • You want to play audio while it’s being generated
  • You’re processing very long texts
Use batch TTS when:
  • You need the complete audio buffer
  • You’re saving to files
  • You need timestamps (use generateSpeechWithTimestamps)
  • Voice cloning with Zipvoice (streaming + voice cloning not supported for Zipvoice)

Quick Start

import { createStreamingTTS } from 'react-native-sherpa-onnx/tts';

// Create streaming TTS engine
const tts = await createStreamingTTS({
  modelPath: { type: 'asset', path: 'models/sherpa-onnx-vits-piper-en' },
  modelType: 'vits',
});

// Generate with streaming callbacks
const controller = await tts.generateSpeechStream(
  'Hello, this is streaming text-to-speech.',
  { sid: 0, speed: 1.0 },
  {
    onChunk: (chunk) => {
      console.log('Received chunk:', chunk.samples.length, 'samples');
      console.log('Progress:', (chunk.progress * 100).toFixed(1), '%');
      console.log('Is final:', chunk.isFinal);
      
      // Play chunk immediately
      playPcmSamples(chunk.samples, chunk.sampleRate);
    },
    onEnd: (event) => {
      if (event.cancelled) {
        console.log('Generation was cancelled');
      } else {
        console.log('Generation complete');
      }
    },
    onError: (event) => {
      console.error('TTS error:', event.message);
    },
  }
);

// Optional: cancel generation
// await controller.cancel();

// Clean up
await tts.destroy();

API Reference

createStreamingTTS(options)

Creates a streaming TTS engine.
src/tts/streaming.ts
export async function createStreamingTTS(
  options: TTSInitializeOptions | ModelPathConfig
): Promise<StreamingTtsEngine>;
Accepts the same options as createTTS(). See Text-to-Speech for details.
Streaming vs Batch Engines:
  • Use createStreamingTTS() for streaming generation (generateSpeechStream)
  • Use createTTS() for batch generation (generateSpeech, generateSpeechWithTimestamps)
They share the same native TTS instance but provide different JS interfaces.

StreamingTtsEngine: generateSpeechStream(text, options, handlers)

Starts streaming generation with chunk callbacks.
const controller = await tts.generateSpeechStream(
  text,
  options,
  handlers
);
Parameters:
text
string
required
Text to synthesize.
options
TtsGenerationOptions
Generation options (same as batch TTS):
  • sid: Speaker ID (default: 0)
  • speed: Speech speed multiplier (default: 1.0)
  • silenceScale: Silence scale
  • referenceAudio: Reference audio for voice cloning (Pocket; not supported for Zipvoice streaming)
  • referenceText: Transcript of reference audio
  • numSteps: Flow-matching steps
  • extra: Model-specific options
handlers
TtsStreamHandlers
required
Callbacks for chunks, completion, and errors:
  • onChunk?: (chunk: TtsStreamChunk) => void
  • onEnd?: (event: TtsStreamEnd) => void
  • onError?: (event: TtsStreamError) => void
Returns: Promise<TtsStreamController> - Controller to cancel or unsubscribe.
Only one stream per engine can be active at a time. Starting another stream before the first finishes will reject with TTS_STREAM_ERROR.

TtsStreamHandlers

Callbacks for streaming events.

onChunk(chunk)

Called for each generated audio chunk.
onChunk: (chunk) => {
  // chunk.samples: number[] - Float PCM in [-1, 1]
  // chunk.sampleRate: number - Sample rate in Hz
  // chunk.progress: number - Progress 0..1
  // chunk.isFinal: boolean - True for last chunk
  
  playPcmSamples(chunk.samples, chunk.sampleRate);
}
TtsStreamChunk:
interface TtsStreamChunk {
  instanceId?: string;  // Engine instance (for routing)
  requestId?: string;   // Request ID (for concurrent streams)
  samples: number[];    // Float PCM samples [-1, 1]
  sampleRate: number;   // Sample rate in Hz
  progress: number;     // Progress 0..1
  isFinal: boolean;     // True for last chunk
}
Keep onChunk lightweight. Forward audio to native playback quickly. Heavy processing can cause stuttering.

onEnd(event)

Called when generation finishes or is cancelled. Listeners are auto-removed after this.
onEnd: (event) => {
  if (event.cancelled) {
    console.log('User cancelled');
  } else {
    console.log('Generation complete');
  }
}
TtsStreamEnd:
interface TtsStreamEnd {
  instanceId?: string;
  requestId?: string;
  cancelled: boolean;  // True if cancelled
}

onError(event)

Called on generation errors. Listeners are auto-removed after this.
onError: (event) => {
  console.error('TTS error:', event.message);
}
TtsStreamError:
interface TtsStreamError {
  instanceId?: string;
  requestId?: string;
  message: string;
}

TtsStreamController

Returned by generateSpeechStream(). Use to cancel or unsubscribe.
interface TtsStreamController {
  cancel(): Promise<void>;      // Stop generation and unsubscribe
  unsubscribe(): void;          // Remove listeners only
}
Methods:
  • cancel(): Stops generation and removes event listeners
  • unsubscribe(): Removes event listeners only (call after completion if you didn’t wait for end/error)
Listeners are automatically removed when onEnd or onError is called. Call unsubscribe() manually only if you discard the controller early (e.g., navigation away).

StreamingTtsEngine: cancelSpeechStream()

Cancel the currently active stream.
await tts.cancelSpeechStream();

Native PCM Player

The SDK provides a native PCM player for low-latency audio playback.

startPcmPlayer(sampleRate, channels)

Start the native PCM player.
const sampleRate = await tts.getSampleRate();
await tts.startPcmPlayer(sampleRate, 1);  // Mono

writePcmChunk(samples)

Write PCM samples to the player. Call from onChunk.
onChunk: async (chunk) => {
  await tts.writePcmChunk(chunk.samples);
}
writePcmChunk() expects float PCM samples in [-1.0, 1.0]. Values outside this range will clip.

stopPcmPlayer()

Stop and release the PCM player.
await tts.stopPcmPlayer();

Complete Example: Streaming with Native Playback

import { createStreamingTTS } from 'react-native-sherpa-onnx/tts';

// Create engine
const tts = await createStreamingTTS({
  modelPath: { type: 'asset', path: 'models/vits-piper-en' },
  numThreads: 2,
});

// Start native player
const sampleRate = await tts.getSampleRate();
await tts.startPcmPlayer(sampleRate, 1);

// Accumulate chunks for optional save
const allChunks: number[] = [];

// Start streaming generation
const controller = await tts.generateSpeechStream(
  'This is a longer text that will be generated in chunks.',
  { speed: 1.0 },
  {
    onChunk: async (chunk) => {
      // Play immediately
      if (chunk.samples.length > 0) {
        await tts.writePcmChunk(chunk.samples);
      }
      
      // Optionally accumulate
      allChunks.push(...chunk.samples);
      
      // Update UI
      console.log('Progress:', (chunk.progress * 100).toFixed(1) + '%');
    },
    onEnd: async (event) => {
      await tts.stopPcmPlayer();
      
      if (!event.cancelled && allChunks.length > 0) {
        // Optionally save accumulated audio
        const audio = { samples: allChunks, sampleRate };
        await saveAudioToFile(audio, '/path/output.wav');
      }
    },
    onError: async (event) => {
      await tts.stopPcmPlayer();
      console.error('Error:', event.message);
    },
  }
);

// To cancel mid-generation:
// await controller.cancel();

// Later: clean up engine
await tts.destroy();

Voice Cloning with Streaming

Pocket TTS (Supported)

Pocket TTS supports streaming with voice cloning:
const tts = await createStreamingTTS({
  modelPath: { type: 'asset', path: 'models/pocket-tts' },
  modelType: 'pocket',
});

const refAudio = loadReferenceAudio();  // Your function

const controller = await tts.generateSpeechStream(
  'Target text to speak in reference voice.',
  {
    referenceAudio: {
      samples: refAudio.samples,
      sampleRate: 22050,
    },
    referenceText: 'Transcript of reference audio.',
    numSteps: 20,
    speed: 1.0,
    extra: {
      temperature: '0.7',
      chunk_size: '15',
    },
  },
  {
    onChunk: (chunk) => playPcmSamples(chunk.samples, chunk.sampleRate),
    onEnd: () => console.log('Done'),
    onError: (e) => console.error(e.message),
  }
);

Zipvoice (Not Supported)

Zipvoice does not support streaming with voice cloning. Use batch mode (createTTS() + generateSpeech()) for voice cloning with Zipvoice.
import { createTTS } from 'react-native-sherpa-onnx/tts';

const tts = await createTTS({
  modelPath: { type: 'asset', path: 'models/zipvoice-zh-en' },
  modelType: 'zipvoice',
});

const audio = await tts.generateSpeech('Text', {
  referenceAudio: { samples: refSamples, sampleRate: 24000 },
  referenceText: 'Transcript',
});

Multiple Requests

Only one stream can be active per engine at a time.

Sequential Requests

Wait for the previous stream to finish:
const tts = await createStreamingTTS({ /* ... */ });

// First request
await tts.generateSpeechStream('First text', undefined, handlers);
// Wait for onEnd callback

// Second request
await tts.generateSpeechStream('Second text', undefined, handlers);

Concurrent Requests (Multiple Engines)

Create multiple engines for concurrent streams:
const tts1 = await createStreamingTTS({ /* ... */ });
const tts2 = await createStreamingTTS({ /* ... */ });

// Both can run concurrently
const controller1 = await tts1.generateSpeechStream('Text 1', undefined, handlers1);
const controller2 = await tts2.generateSpeechStream('Text 2', undefined, handlers2);

// Events are tagged with instanceId and requestId for routing

Cancellation

Cancel via Controller

const controller = await tts.generateSpeechStream(text, undefined, handlers);

// User taps "Stop" button
await controller.cancel();  // Stops generation and unsubscribes

Cancel via Engine

await tts.cancelSpeechStream();

Recording Streamed Audio

Accumulate chunks to save the complete audio:
const chunks: number[] = [];
let sampleRate = 0;

const controller = await tts.generateSpeechStream(
  longText,
  { speed: 1.0 },
  {
    onChunk: (chunk) => {
      sampleRate = chunk.sampleRate;
      chunks.push(...chunk.samples);
      
      // Optionally play while recording
      playPcmSamples(chunk.samples, chunk.sampleRate);
    },
    onEnd: async () => {
      if (chunks.length > 0) {
        const audio = { samples: chunks, sampleRate };
        await saveAudioToFile(audio, '/path/output.wav');
      }
    },
    onError: () => {
      // Handle error
    },
  }
);
Memory Warning: Accumulating very long audio in JS can exhaust memory. For very long texts, consider:
  • Saving chunks incrementally to native storage
  • Splitting long texts into smaller segments
  • Using batch mode with file output

Performance Tips

Reduce Latency

  1. Use native PCM player (avoid JS audio bridge overhead)
  2. Keep onChunk lightweight (no heavy processing)
  3. Increase numThreads for faster generation
  4. Use hardware acceleration when available
const tts = await createStreamingTTS({
  modelPath: { type: 'asset', path: 'models/vits-piper-en' },
  numThreads: 4,
  provider: 'coreml',  // iOS: Core ML
});

Balance Chunk Size

The maxNumSentences option controls chunk size:
const tts = await createStreamingTTS({
  modelPath: { type: 'asset', path: 'models/vits-piper-en' },
  maxNumSentences: 2,  // Larger chunks = fewer callbacks
});
  • Smaller chunks (1 sentence): Lower latency, more callbacks
  • Larger chunks (2+ sentences): Higher latency, fewer callbacks

Avoid Memory Issues

  • Don’t accumulate all chunks for very long sessions
  • Use native-side streaming-to-file if possible
  • Split long texts into smaller generation requests

Common Use Cases

Voice Assistant

const tts = await createStreamingTTS({
  modelPath: { type: 'asset', path: 'models/vits-piper-en' },
});

const sampleRate = await tts.getSampleRate();
await tts.startPcmPlayer(sampleRate, 1);

async function speak(text: string) {
  const controller = await tts.generateSpeechStream(
    text,
    { speed: 1.0 },
    {
      onChunk: async (chunk) => {
        await tts.writePcmChunk(chunk.samples);
      },
      onEnd: () => {
        console.log('Finished speaking');
      },
      onError: (e) => {
        console.error('Speech error:', e.message);
      },
    }
  );
  
  return controller;  // Allow caller to cancel
}

// Use
const controller = await speak('Hello, how can I help you?');

// Cancel if needed
// await controller.cancel();

Progress Indicator

const [progress, setProgress] = useState(0);

const controller = await tts.generateSpeechStream(
  longText,
  undefined,
  {
    onChunk: (chunk) => {
      setProgress(chunk.progress * 100);
      playPcmSamples(chunk.samples, chunk.sampleRate);
    },
    onEnd: () => {
      setProgress(100);
    },
    onError: () => {
      setProgress(0);
    },
  }
);

// UI: <ProgressBar progress={progress} />

Text-to-Speech Button with Cancel

const [isSpeaking, setIsSpeaking] = useState(false);
const [controller, setController] = useState<TtsStreamController | null>(null);

async function handleSpeak() {
  if (isSpeaking && controller) {
    // Cancel
    await controller.cancel();
    await tts.stopPcmPlayer();
    setIsSpeaking(false);
    setController(null);
  } else {
    // Start
    setIsSpeaking(true);
    const sampleRate = await tts.getSampleRate();
    await tts.startPcmPlayer(sampleRate, 1);
    
    const ctrl = await tts.generateSpeechStream(
      text,
      { speed: 1.0 },
      {
        onChunk: async (chunk) => {
          await tts.writePcmChunk(chunk.samples);
        },
        onEnd: async () => {
          await tts.stopPcmPlayer();
          setIsSpeaking(false);
          setController(null);
        },
        onError: async () => {
          await tts.stopPcmPlayer();
          setIsSpeaking(false);
          setController(null);
        },
      }
    );
    
    setController(ctrl);
  }
}

// UI: <Button title={isSpeaking ? 'Stop' : 'Speak'} onPress={handleSpeak} />

Troubleshooting

Only one stream per engine can be active. Wait for the previous stream to finish or cancel it:
await previousController.cancel();
// Now start new stream
  • Keep onChunk lightweight (avoid heavy processing)
  • Use native PCM player instead of JS audio APIs
  • Increase numThreads for faster generation
  • Reduce audio bridge overhead by writing larger chunks
  • Use hardware acceleration (provider: 'coreml' on iOS)
  • Increase numThreads
  • Reduce maxNumSentences for smaller chunks
  • Don’t accumulate all chunks in JS
  • Split long texts into smaller requests
  • Use batch mode for very long texts
  • Pocket TTS: Voice cloning is supported in streaming
  • Zipvoice: Voice cloning is not supported in streaming; use batch mode (createTTS() + generateSpeech())

Next Steps

Text-to-Speech

Batch TTS generation and configuration

Model Setup

Learn how to bundle and load models

Speech-to-Text

Transcribe audio to text

Streaming STT

Real-time speech recognition

Build docs developers (and LLMs) love