TTSInitializeOptions
Configuration for initializing a TTS engine (batch or streaming).Parameters
Path to the TTS model directory.Can be:
{ type: 'asset', path: 'models/vits-piper-en' }- Asset bundled with app{ type: 'file', path: '/absolute/path/to/model' }- File system path{ type: 'auto', path: 'models/...' }- Auto-detect location
Model type to use. If not specified or
'auto', the type will be auto-detected based on files in the model directory.Supported types:'vits'- VITS models (Piper, Coqui, MeloTTS, MMS)'matcha'- Matcha models (acoustic + vocoder)'kokoro'- Kokoro models (multi-speaker, multi-language)'kitten'- KittenTTS models (lightweight, multi-speaker)'pocket'- Pocket TTS models'zipvoice'- Zipvoice models (voice cloning)'auto'- Auto-detect (default)
Execution provider for ONNX inference.Common values:
'cpu'- CPU execution (default, always available)'coreml'- Apple CoreML (iOS/macOS, check withgetCoreMlSupport())'xnnpack'- XNNPACK (mobile optimized)'nnapi'- Android NNAPI'qnn'- Qualcomm AI Engine
Number of threads for inference.
- More threads = faster processing but higher CPU usage
- Typical values: 2-4
- Not used when hardware accelerators (CoreML, NNAPI) are active
Enable debug logging from the native TTS engine.
Model-specific options (noise scale, length scale, etc.).Only the options for the loaded model type are applied. For example, when
modelType is 'vits', only modelOptions.vits is used.See TtsModelOptions below.Path(s) to rule FSTs (Finite State Transducers) for text normalization/ITN (Inverse Text Normalization).
Path(s) to rule FARs (Finite-state Archive) for text normalization/ITN.
Maximum number of sentences per streaming callback.
Silence scale at configuration level (global silence padding).Can also be set per-generation via
TtsGenerationOptions.silenceScale.Example
TtsModelOptions
Model-specific configuration options. Only the block for the loaded model type is applied.TtsVitsModelOptions
Options for VITS models (Piper, Coqui, MeloTTS, MMS variants).Noise scale parameter. Controls voice variation/expressiveness.If omitted, model default (from
model.json) is used.Noise scale W parameter. Controls additional voice characteristics.If omitted, model default is used.
Length scale parameter. Controls speech duration/speed.
< 1.0= faster speech1.0= normal speed> 1.0= slower speech
TtsMatchaModelOptions
Options for Matcha models (acoustic model + vocoder).Noise scale parameter.
Length scale parameter.
TtsKokoroModelOptions
Options for Kokoro models (multi-speaker, multi-language).Length scale parameter.
TtsKittenModelOptions
Options for KittenTTS models (lightweight, multi-speaker).Length scale parameter.
TtsPocketModelOptions
Options for Pocket TTS models. Currently has no init-time configuration.TtsGenerationOptions.referenceAudio.
TtsUpdateOptions
Options for updating TTS model parameters at runtime without reloading the model.Model type currently loaded.When omitted or
'auto', the SDK uses the model type from the last successful initialization. After calling destroy(), pass modelType explicitly until initialized again.Model-specific options to update.Only the block for the effective model type is used (e.g.,
modelOptions.vits when type is 'vits').Example
TtsGenerationOptions
Options for TTS speech generation (both batch and streaming).Speaker ID for multi-speaker models.
- For single-speaker models, this is ignored
- Use
getNumSpeakers()to check how many speakers are available - Typically ranges from
0tonumSpeakers - 1
Speech speed multiplier.
1.0= normal speed0.5= half speed (slower)2.0= double speed (faster)- Typical range:
0.5to2.0
Silence scale for this generation (overrides config-level
silenceScale).Controls the amount of silence/pauses in the generated speech.Reference audio for voice cloning.
- Only used by Pocket TTS - other model types ignore this
samples- Mono float PCM samples in range [-1.0, 1.0]sampleRate- Sample rate in Hz (e.g., 22050, 44100)
Transcript text of the reference audio.
- Required for Pocket TTS when
referenceAudiois provided - Ignored by other model types
Number of generation steps (e.g., flow-matching steps).Used by models like Pocket TTS. Higher values = better quality but slower generation.
Extra model-specific options as key-value pairs.Examples for Pocket TTS:
temperature- Controls randomnesschunk_size- Generation chunk size
Examples
Basic generation with speed:TTSModelType
Supported TTS model types.VITS models - includes Piper, Coqui, MeloTTS, MMS variants
Matcha models - acoustic model + vocoder
Kokoro models - multi-speaker, multi-language
KittenTTS models - lightweight, multi-speaker
Pocket TTS models - supports voice cloning
Zipvoice models - voice cloning capable
Auto-detect model type based on files present (default)
Runtime type list
See Also
- createTTS() - Batch TTS engine
- createStreamingTTS() - Streaming TTS engine
- Types - All TypeScript type definitions