Skip to main content
This page lists all supported model types for Speech-to-Text (STT) and Text-to-Speech (TTS), including required files and download links.

Speech-to-Text (STT) Models

For real-time (streaming) recognition from a microphone or audio stream, use streaming-capable model types: transducer, paraformer, zipformer2_ctc, nemo_ctc, or tone_ctc.See the Streaming STT documentation for details.

Zipformer/Transducer

Model Type: transducerRequired Files:
  • encoder.onnx
  • decoder.onnx
  • joiner.onnx
  • tokens.txt
Download: Offline Transducer Models

Paraformer

Model Type: paraformerRequired Files:
  • model.onnx (or model.int8.onnx)
  • tokens.txt
Download: Offline Paraformer Models

NeMo CTC

Model Type: nemo_ctcRequired Files:
  • model.onnx (or model.int8.onnx)
  • tokens.txt
Download: NeMo CTC Models

Whisper

Model Type: whisperRequired Files:
  • encoder.onnx
  • decoder.onnx
  • tokens.txt
Download: Whisper Models

WeNet CTC

Model Type: wenet_ctcRequired Files:
  • model.onnx (or model.int8.onnx)
  • tokens.txt
Download: WeNet CTC Models

SenseVoice

Model Type: sense_voiceRequired Files:
  • model.onnx (or model.int8.onnx)
  • tokens.txt
Download: SenseVoice Models

FunASR Nano

Model Type: funasr_nanoRequired Files:
  • encoder_adaptor.onnx
  • llm.onnx
  • embedding.onnx
  • tokenizer/ directory
Download: FunASR Nano Models

Tone CTC (t-one)

Model Type: tone_ctcRequired Files:
  • model.onnx
  • tokens.txt
Note: Folder name usually contains t-one, t_one or toneDownload: Online CTC Models

Text-to-Speech (TTS) Models

For streaming TTS (incremental generation, low latency), use createStreamingTTS() with supported model types.See the Streaming TTS documentation for details.

VITS

Model Type: vitsDescription: Fast, high-quality TTS. Includes Piper, Coqui, MeloTTS, MMS variants.Required Files:
  • model.onnx
  • tokens.txt
Download: TTS Models Release

Matcha

Model Type: matchaDescription: High-quality acoustic model + vocoderRequired Files:
  • acoustic_model.onnx
  • vocoder.onnx
  • tokens.txt
Download: Matcha Models

Kokoro

Model Type: kokoroDescription: Multi-speaker, multi-languageRequired Files:
  • model.onnx
  • voices.bin
  • tokens.txt
  • espeak-ng-data/ directory
Download: TTS Models Release

KittenTTS

Model Type: kittenDescription: Lightweight, multi-speakerRequired Files:
  • model.onnx
  • voices.bin
  • tokens.txt
  • espeak-ng-data/ directory
Download: TTS Models Release

Zipvoice

Model Type: zipvoiceDescription: Voice cloning capableRequired Files:
  • encoder.onnx
  • decoder.onnx
  • vocoder.onnx
  • tokens.txt
Download: Zipvoice Models

Pocket

Model Type: pocketDescription: Flow-matching TTSRequired Files:
  • lm_flow.onnx
  • lm_main.onnx
  • encoder.onnx
  • decoder.onnx
  • text_conditioner.onnx
  • vocab.json
  • token_scores.json
Download: TTS Models Release

Model Quantization

The SDK automatically detects and prefers quantized (int8) models when available. For example, if both model.onnx and model.int8.onnx exist, the library chooses according to the preferInt8 option in init.

Auto-Detection

The library detects model types from the files present in each model directory. Folder and file names do not need to follow any fixed convention. To use auto-detection:
const stt = await createSTT({
  modelPath: { type: 'asset', path: 'models/your-model-folder' },
  modelType: 'auto', // Auto-detect from files
});

See Also

Model Setup

Learn how to bundle, download, and manage models

Download Manager

Download models in-app with progress tracking

STT API

Speech-to-Text API reference

TTS API

Text-to-Speech API reference

Build docs developers (and LLMs) love