Skip to main content
Complete TypeScript type definitions for the TTS API.

Audio Types

GeneratedAudio

Generated audio data from TTS synthesis.
interface GeneratedAudio {
  samples: number[];
  sampleRate: number;
}
samples
number[]
Audio samples as an array of float values in range [-1.0, 1.0].This is raw PCM audio data that can be:
  • Saved to a WAV file using saveAudioToFile()
  • Played using an audio library
  • Processed further for effects or analysis
sampleRate
number
Sample rate of the generated audio in Hz.Common values:
  • 16000 - 16 kHz (voice/telephony)
  • 22050 - 22.05 kHz (common for TTS)
  • 44100 - 44.1 kHz (CD quality)
  • 48000 - 48 kHz (professional audio)
Example:
const audio: GeneratedAudio = await tts.generateSpeech('Hello');
console.log(`Generated ${audio.samples.length} samples`);
console.log(`Sample rate: ${audio.sampleRate}Hz`);
console.log(`Duration: ${audio.samples.length / audio.sampleRate}s`);

GeneratedAudioWithTimestamps

Generated audio with word-level timestamp metadata for subtitles.
interface GeneratedAudioWithTimestamps extends GeneratedAudio {
  subtitles: TtsSubtitleItem[];
  estimated: boolean;
}
subtitles
TtsSubtitleItem[]
Array of subtitle/timestamp entries for each word or token.See TtsSubtitleItem below.
estimated
boolean
Whether timestamps are estimated rather than model-provided.
  • true - Timestamps are estimated/calculated
  • false - Timestamps come directly from the TTS model
Example:
const result = await tts.generateSpeechWithTimestamps('Hello world');

console.log('Subtitles:');
result.subtitles.forEach((item) => {
  console.log(`  ${item.text}: ${item.start.toFixed(2)}s - ${item.end.toFixed(2)}s`);
});

if (result.estimated) {
  console.log('Note: Timestamps are estimated');
}

TtsSubtitleItem

Subtitle/timestamp entry for a word or token in synthesized speech.
interface TtsSubtitleItem {
  text: string;
  start: number;
  end: number;
}
text
string
Text token for this time range (word, phoneme, or character).
start
number
Start time in seconds.
end
number
End time in seconds.
Example:
const item: TtsSubtitleItem = {
  text: 'Hello',
  start: 0.0,
  end: 0.5,
};

Model Info Types

TTSModelInfo

Information about TTS model capabilities.
interface TTSModelInfo {
  sampleRate: number;
  numSpeakers: number;
}
sampleRate
number
Sample rate that the model generates audio at (in Hz).
numSpeakers
number
Number of speakers/voices available in the model.
  • 0 or 1 = Single-speaker model
  • > 1 = Multi-speaker model
Example:
const info = await tts.getModelInfo();

if (info.numSpeakers > 1) {
  console.log(`Multi-speaker model with ${info.numSpeakers} voices`);
  // Can use sid parameter to select different speakers
} else {
  console.log('Single-speaker model');
}

Engine Interfaces

TtsEngine

Batch TTS engine interface returned by createTTS().
interface TtsEngine {
  readonly instanceId: string;
  
  generateSpeech(
    text: string,
    options?: TtsGenerationOptions
  ): Promise<GeneratedAudio>;
  
  generateSpeechWithTimestamps(
    text: string,
    options?: TtsGenerationOptions
  ): Promise<GeneratedAudioWithTimestamps>;
  
  updateParams(options: TtsUpdateOptions): Promise<{
    success: boolean;
    detectedModels: Array<{ type: string; modelDir: string }>;
  }>;
  
  getModelInfo(): Promise<TTSModelInfo>;
  getSampleRate(): Promise<number>;
  getNumSpeakers(): Promise<number>;
  
  destroy(): Promise<void>;
}
See createTTS() documentation for detailed method descriptions.

StreamingTtsEngine

Streaming TTS engine interface returned by createStreamingTTS().
interface StreamingTtsEngine {
  readonly instanceId: string;
  
  generateSpeechStream(
    text: string,
    options: TtsGenerationOptions | undefined,
    handlers: TtsStreamHandlers
  ): Promise<TtsStreamController>;
  
  cancelSpeechStream(): Promise<void>;
  
  startPcmPlayer(sampleRate: number, channels: number): Promise<void>;
  writePcmChunk(samples: number[]): Promise<void>;
  stopPcmPlayer(): Promise<void>;
  
  getModelInfo(): Promise<TTSModelInfo>;
  getSampleRate(): Promise<number>;
  getNumSpeakers(): Promise<number>;
  
  destroy(): Promise<void>;
}
See createStreamingTTS() documentation for detailed method descriptions.

Streaming Types

TtsStreamHandlers

Event handlers for streaming TTS generation.
interface TtsStreamHandlers {
  onChunk?: (chunk: TtsStreamChunk) => void;
  onEnd?: (event: TtsStreamEnd) => void;
  onError?: (event: TtsStreamError) => void;
}
onChunk
(chunk: TtsStreamChunk) => void
Called for each audio chunk as it’s generated.
onEnd
(event: TtsStreamEnd) => void
Called when generation completes or is cancelled.
onError
(event: TtsStreamError) => void
Called when an error occurs during generation.

TtsStreamChunk

Audio chunk received during streaming generation.
interface TtsStreamChunk {
  instanceId?: string;
  requestId?: string;
  samples: number[];
  sampleRate: number;
  progress: number;
  isFinal: boolean;
}
instanceId
string
Instance ID of the TTS engine that generated this chunk.
requestId
string
Request ID for this specific generation (distinguishes concurrent streams).
samples
number[]
Float PCM audio samples in range [-1.0, 1.0].
sampleRate
number
Sample rate of the audio in Hz.
progress
number
Generation progress as a float between 0.0 and 1.0.
isFinal
boolean
true if this is the last chunk in the stream.

TtsStreamEnd

Event emitted when streaming generation ends.
interface TtsStreamEnd {
  instanceId?: string;
  requestId?: string;
  cancelled: boolean;
}
instanceId
string
Instance ID of the TTS engine.
requestId
string
Request ID for this generation.
cancelled
boolean
true if generation was cancelled, false if it completed normally.

TtsStreamError

Event emitted when an error occurs during streaming.
interface TtsStreamError {
  instanceId?: string;
  requestId?: string;
  message: string;
}
instanceId
string
Instance ID of the TTS engine.
requestId
string
Request ID for this generation.
message
string
Error message describing what went wrong.

TtsStreamController

Controller returned by generateSpeechStream() for managing the stream.
interface TtsStreamController {
  cancel(): Promise<void>;
  unsubscribe(): void;
}
cancel
() => Promise<void>
Cancels the ongoing TTS generation.
unsubscribe
() => void
Removes event listeners. Called automatically on end/error, or can be called manually.

Configuration Types

See the Configuration page for detailed documentation of:
  • TTSInitializeOptions
  • TtsModelOptions
  • TtsVitsModelOptions
  • TtsMatchaModelOptions
  • TtsKokoroModelOptions
  • TtsKittenModelOptions
  • TtsPocketModelOptions
  • TtsUpdateOptions
  • TtsGenerationOptions
  • TTSModelType

Type Imports

Import all TTS types

import type {
  // Engine interfaces
  TtsEngine,
  StreamingTtsEngine,
  
  // Configuration
  TTSInitializeOptions,
  TTSModelType,
  TtsModelOptions,
  TtsVitsModelOptions,
  TtsMatchaModelOptions,
  TtsKokoroModelOptions,
  TtsKittenModelOptions,
  TtsPocketModelOptions,
  TtsUpdateOptions,
  TtsGenerationOptions,
  
  // Audio types
  GeneratedAudio,
  GeneratedAudioWithTimestamps,
  TtsSubtitleItem,
  TTSModelInfo,
  
  // Streaming types
  TtsStreamController,
  TtsStreamHandlers,
  TtsStreamChunk,
  TtsStreamEnd,
  TtsStreamError,
} from 'react-native-sherpa-onnx';

Import runtime constants

import { TTS_MODEL_TYPES } from 'react-native-sherpa-onnx';

console.log('Supported model types:', TTS_MODEL_TYPES);
// ['vits', 'matcha', 'kokoro', 'kitten', 'pocket', 'zipvoice', 'auto']

See Also

Build docs developers (and LLMs) love