Skip to main content

TTSInitializeOptions

Configuration for initializing a TTS engine (batch or streaming).
interface TTSInitializeOptions {
  modelPath: ModelPathConfig;
  modelType?: TTSModelType;
  provider?: string;
  numThreads?: number;
  debug?: boolean;
  modelOptions?: TtsModelOptions;
  ruleFsts?: string;
  ruleFars?: string;
  maxNumSentences?: number;
  silenceScale?: number;
}

Parameters

modelPath
ModelPathConfig
required
Path to the TTS model directory.Can be:
  • { type: 'asset', path: 'models/vits-piper-en' } - Asset bundled with app
  • { type: 'file', path: '/absolute/path/to/model' } - File system path
  • { type: 'auto', path: 'models/...' } - Auto-detect location
modelType
TTSModelType
default:"'auto'"
Model type to use. If not specified or 'auto', the type will be auto-detected based on files in the model directory.Supported types:
  • 'vits' - VITS models (Piper, Coqui, MeloTTS, MMS)
  • 'matcha' - Matcha models (acoustic + vocoder)
  • 'kokoro' - Kokoro models (multi-speaker, multi-language)
  • 'kitten' - KittenTTS models (lightweight, multi-speaker)
  • 'pocket' - Pocket TTS models
  • 'zipvoice' - Zipvoice models (voice cloning)
  • 'auto' - Auto-detect (default)
provider
string
default:"'cpu'"
Execution provider for ONNX inference.Common values:
  • 'cpu' - CPU execution (default, always available)
  • 'coreml' - Apple CoreML (iOS/macOS, check with getCoreMlSupport())
  • 'xnnpack' - XNNPACK (mobile optimized)
  • 'nnapi' - Android NNAPI
  • 'qnn' - Qualcomm AI Engine
Use the support detection functions to check availability before using hardware acceleration.
numThreads
number
default:"2"
Number of threads for inference.
  • More threads = faster processing but higher CPU usage
  • Typical values: 2-4
  • Not used when hardware accelerators (CoreML, NNAPI) are active
debug
boolean
default:"false"
Enable debug logging from the native TTS engine.
modelOptions
TtsModelOptions
Model-specific options (noise scale, length scale, etc.).Only the options for the loaded model type are applied. For example, when modelType is 'vits', only modelOptions.vits is used.See TtsModelOptions below.
ruleFsts
string
Path(s) to rule FSTs (Finite State Transducers) for text normalization/ITN (Inverse Text Normalization).
ruleFars
string
Path(s) to rule FARs (Finite-state Archive) for text normalization/ITN.
maxNumSentences
number
default:"1"
Maximum number of sentences per streaming callback.
silenceScale
number
default:"0.2"
Silence scale at configuration level (global silence padding).Can also be set per-generation via TtsGenerationOptions.silenceScale.

Example

import { createTTS } from 'react-native-sherpa-onnx';

const tts = await createTTS({
  modelPath: { type: 'asset', path: 'models/vits-piper-en' },
  modelType: 'vits',
  provider: 'cpu',
  numThreads: 4,
  debug: false,
  modelOptions: {
    vits: {
      noiseScale: 0.667,
      lengthScale: 1.0,
    },
  },
  silenceScale: 0.2,
});

TtsModelOptions

Model-specific configuration options. Only the block for the loaded model type is applied.
interface TtsModelOptions {
  vits?: TtsVitsModelOptions;
  matcha?: TtsMatchaModelOptions;
  kokoro?: TtsKokoroModelOptions;
  kitten?: TtsKittenModelOptions;
  pocket?: TtsPocketModelOptions;
}

TtsVitsModelOptions

Options for VITS models (Piper, Coqui, MeloTTS, MMS variants).
interface TtsVitsModelOptions {
  noiseScale?: number;
  noiseScaleW?: number;
  lengthScale?: number;
}
noiseScale
number
Noise scale parameter. Controls voice variation/expressiveness.If omitted, model default (from model.json) is used.
noiseScaleW
number
Noise scale W parameter. Controls additional voice characteristics.If omitted, model default is used.
lengthScale
number
Length scale parameter. Controls speech duration/speed.
  • < 1.0 = faster speech
  • 1.0 = normal speed
  • > 1.0 = slower speech
If omitted, model default is used.
Example:
modelOptions: {
  vits: {
    noiseScale: 0.667,
    noiseScaleW: 0.8,
    lengthScale: 1.0,
  },
}

TtsMatchaModelOptions

Options for Matcha models (acoustic model + vocoder).
interface TtsMatchaModelOptions {
  noiseScale?: number;
  lengthScale?: number;
}
noiseScale
number
Noise scale parameter.
lengthScale
number
Length scale parameter.

TtsKokoroModelOptions

Options for Kokoro models (multi-speaker, multi-language).
interface TtsKokoroModelOptions {
  lengthScale?: number;
}
lengthScale
number
Length scale parameter.

TtsKittenModelOptions

Options for KittenTTS models (lightweight, multi-speaker).
interface TtsKittenModelOptions {
  lengthScale?: number;
}
lengthScale
number
Length scale parameter.

TtsPocketModelOptions

Options for Pocket TTS models. Currently has no init-time configuration.
interface TtsPocketModelOptions {}
Voice cloning for Pocket TTS is configured at generation time via TtsGenerationOptions.referenceAudio.

TtsUpdateOptions

Options for updating TTS model parameters at runtime without reloading the model.
interface TtsUpdateOptions {
  modelType?: TTSModelType;
  modelOptions?: TtsModelOptions;
}
modelType
TTSModelType
Model type currently loaded.When omitted or 'auto', the SDK uses the model type from the last successful initialization. After calling destroy(), pass modelType explicitly until initialized again.
modelOptions
TtsModelOptions
Model-specific options to update.Only the block for the effective model type is used (e.g., modelOptions.vits when type is 'vits').

Example

// Update VITS model parameters
await tts.updateParams({
  modelType: 'vits',
  modelOptions: {
    vits: {
      noiseScale: 0.8,
      lengthScale: 1.1,
    },
  },
});

TtsGenerationOptions

Options for TTS speech generation (both batch and streaming).
interface TtsGenerationOptions {
  sid?: number;
  speed?: number;
  silenceScale?: number;
  referenceAudio?: { samples: number[]; sampleRate: number };
  referenceText?: string;
  numSteps?: number;
  extra?: Record<string, string>;
}
sid
number
default:"0"
Speaker ID for multi-speaker models.
  • For single-speaker models, this is ignored
  • Use getNumSpeakers() to check how many speakers are available
  • Typically ranges from 0 to numSpeakers - 1
speed
number
default:"1.0"
Speech speed multiplier.
  • 1.0 = normal speed
  • 0.5 = half speed (slower)
  • 2.0 = double speed (faster)
  • Typical range: 0.5 to 2.0
silenceScale
number
Silence scale for this generation (overrides config-level silenceScale).Controls the amount of silence/pauses in the generated speech.
referenceAudio
{ samples: number[]; sampleRate: number }
Reference audio for voice cloning.
  • Only used by Pocket TTS - other model types ignore this
  • samples - Mono float PCM samples in range [-1.0, 1.0]
  • sampleRate - Sample rate in Hz (e.g., 22050, 44100)
referenceText
string
Transcript text of the reference audio.
  • Required for Pocket TTS when referenceAudio is provided
  • Ignored by other model types
numSteps
number
Number of generation steps (e.g., flow-matching steps).Used by models like Pocket TTS. Higher values = better quality but slower generation.
extra
Record<string, string>
Extra model-specific options as key-value pairs.Examples for Pocket TTS:
  • temperature - Controls randomness
  • chunk_size - Generation chunk size

Examples

Basic generation with speed:
const audio = await tts.generateSpeech('Hello world', {
  speed: 1.2,
});
Multi-speaker model:
const numSpeakers = await tts.getNumSpeakers();
if (numSpeakers > 1) {
  const audio = await tts.generateSpeech('Hello', {
    sid: 1, // Use speaker 1
    speed: 1.0,
  });
}
Voice cloning (Pocket TTS):
import { Audio } from 'expo-av';

// Load reference audio
const referenceUri = 'path/to/reference.wav';
const referenceSamples = /* load and convert to float PCM */;

const audio = await tts.generateSpeech('Clone my voice', {
  referenceAudio: {
    samples: referenceSamples,
    sampleRate: 22050,
  },
  referenceText: 'This is the reference text',
  numSteps: 10,
  extra: {
    temperature: '0.7',
  },
});

TTSModelType

Supported TTS model types.
type TTSModelType =
  | 'vits'
  | 'matcha'
  | 'kokoro'
  | 'kitten'
  | 'pocket'
  | 'zipvoice'
  | 'auto';
'vits'
TTSModelType
VITS models - includes Piper, Coqui, MeloTTS, MMS variants
'matcha'
TTSModelType
Matcha models - acoustic model + vocoder
'kokoro'
TTSModelType
Kokoro models - multi-speaker, multi-language
'kitten'
TTSModelType
KittenTTS models - lightweight, multi-speaker
'pocket'
TTSModelType
Pocket TTS models - supports voice cloning
'zipvoice'
TTSModelType
Zipvoice models - voice cloning capable
'auto'
TTSModelType
Auto-detect model type based on files present (default)

Runtime type list

import { TTS_MODEL_TYPES } from 'react-native-sherpa-onnx';

console.log('Supported model types:', TTS_MODEL_TYPES);
// ['vits', 'matcha', 'kokoro', 'kitten', 'pocket', 'zipvoice', 'auto']

See Also

Build docs developers (and LLMs) love