Matcha Models

Matcha is a high-quality TTS model that uses an acoustic model + vocoder pipeline to produce very natural-sounding speech. It’s designed for applications where quality is the top priority.

Model Architecture

Matcha uses a two-stage architecture:

Acoustic Model (acoustic_model.onnx) – Generates mel-spectrogram from text
Vocoder (vocoder.onnx) – Converts mel-spectrogram to waveform
Tokens (tokens.txt) – Text token vocabulary

This separation allows for high-quality synthesis by using specialized neural networks for each stage.

When to Use

High-Quality Audio

When naturalness and quality are more important than speed

Audiobook Narration

Professional-quality narration for long-form content

Content Creation

Voiceovers for videos, podcasts, and media

Expressive Speech

Natural prosody and intonation

Supported Languages

Matcha models are available for:

English (primary focus)
Some multilingual variants

Check the download page for available languages.

Performance Characteristics

Aspect	Rating	Notes
Streaming	✅ Supported	Streaming generation available
Quality	⭐⭐⭐⭐⭐	Excellent, very natural-sounding
Speed	⭐⭐⭐⭐	Fast, but slower than VITS
Memory	⭐⭐⭐	Moderate (two models: acoustic + vocoder)
Model Size	Medium	Typically 50-100 MB (acoustic + vocoder)

Download Links

Matcha Models

Browse and download pretrained Matcha models

Configuration Example

Basic TTS

import { createTTS } from 'react-native-sherpa-onnx/tts';

const tts = await createTTS({
  modelPath: {
    type: 'asset',
    path: 'models/matcha-icefall-en'
  },
  modelType: 'matcha', // or 'auto'
  numThreads: 2,
});

const audio = await tts.generateSpeech('Hello, world!');
console.log('Generated:', audio.samples.length, 'samples at', audio.sampleRate, 'Hz');

await tts.destroy();

With Model Options

const tts = await createTTS({
  modelPath: { type: 'asset', path: 'models/matcha-en' },
  modelType: 'matcha',
  modelOptions: {
    matcha: {
      noiseScale: 0.667,   // Voice variation
      lengthScale: 1.0,    // Speech speed
    }
  },
  numThreads: 2,
});

Streaming TTS

const tts = await createTTS({
  modelPath: { type: 'asset', path: 'models/matcha-en' },
  modelType: 'matcha',
});

const sampleRate = await tts.getSampleRate();
await tts.startPcmPlayer(sampleRate, 1);

await tts.generateSpeechStream('High quality streaming speech', { sid: 0, speed: 1.0 }, {
  onChunk: async (chunk) => {
    await tts.writePcmChunk(chunk.samples);
  },
  onEnd: async () => {
    await tts.stopPcmPlayer();
  },
});

await tts.destroy();

Save to File

import { createTTS, saveAudioToFile } from 'react-native-sherpa-onnx/tts';

const tts = await createTTS({
  modelPath: { type: 'asset', path: 'models/matcha-en' },
  modelType: 'matcha',
});

const audio = await tts.generateSpeech('Save this to a file');
await saveAudioToFile(audio, '/path/to/output.wav');

await tts.destroy();

Model Options

Matcha models support two tuning parameters:

Option	Type	Default	Description
`noiseScale`	`number`	0.667	Controls voice variation and expressiveness. Range: 0.0-1.0
`lengthScale`	`number`	1.0	Speech speed. < 1.0 = faster, > 1.0 = slower

Tuning Examples

// Clear, fast speech
const tts = await createTTS({
  modelPath: { type: 'asset', path: 'models/matcha-en' },
  modelType: 'matcha',
  modelOptions: {
    matcha: {
      noiseScale: 0.4,    // Less variation
      lengthScale: 0.9,   // Slightly faster
    }
  },
});

// Expressive, natural speech
const tts2 = await createTTS({
  modelPath: { type: 'asset', path: 'models/matcha-en' },
  modelType: 'matcha',
  modelOptions: {
    matcha: {
      noiseScale: 0.8,    // More expressive
      lengthScale: 1.1,   // Slightly slower
    }
  },
});

Runtime Updates

const tts = await createTTS({ ... });

await tts.updateParams({
  modelOptions: {
    matcha: {
      noiseScale: 0.7,
      lengthScale: 1.2,
    }
  },
});

Model Detection

Matcha models are detected automatically by:

Presence of acoustic_model.onnx + vocoder.onnx
No folder name pattern required

Expected files:

acoustic_model.onnx
vocoder.onnx
tokens.txt

Performance Tips

Optimize Thread Count

const tts = await createTTS({
  modelPath: { type: 'asset', path: 'models/matcha-en' },
  numThreads: 4, // More threads for faster generation
});

Use Streaming for Long Text

For better perceived performance:

const longText = 'Long audiobook paragraph...';

await tts.generateSpeechStream(longText, { sid: 0, speed: 1.0 }, {
  onChunk: async (chunk) => {
    await tts.writePcmChunk(chunk.samples); // Play while generating
  },
});

Hardware Acceleration

const tts = await createTTS({
  modelPath: { type: 'asset', path: 'models/matcha-en' },
  provider: 'nnapi', // Android NNAPI
  // provider: 'xnnpack', // XNNPACK
});

Streaming Support

Streaming: ✅ YesMatcha models support streaming generation. Use generateSpeechStream() for incremental audio generation and low-latency playback.

Advantages

Excellent Quality: Very natural-sounding speech
Natural Prosody: Good intonation and rhythm
Streaming: Supports incremental generation
Acoustic Model + Vocoder: Flexible two-stage architecture
Multi-Speaker: Some models support multiple speakers

Limitations

Slower than VITS: Two-stage architecture is slightly slower
Larger Size: Requires both acoustic model and vocoder
No Voice Cloning: Cannot synthesize custom voices
Limited Languages: Primarily English-focused

Use Cases

Audiobook Narration

Professional-quality long-form narration

Content Production

Voiceovers for videos and media

E-Learning

High-quality educational content

Podcasts

Natural-sounding podcast narration

Common Issues

Model not loading

Verify both acoustic_model.onnx and vocoder.onnx are present
Check that tokens.txt exists
Ensure sufficient device memory for both models

Slow generation

Increase numThreads on multi-core devices
Use hardware acceleration (provider: 'nnapi')
Consider using VITS for faster generation
Ensure no other heavy apps are running

Audio quality issues

Adjust noiseScale for more/less expressiveness
Try different lengthScale values
Ensure correct sample rate for playback
Check that vocoder output is not being resampled incorrectly

Comparison with Other Models

Feature	Matcha	VITS	Zipvoice	Kokoro
Quality	Very High	High	Very High	High
Speed	Fast	Very Fast	Medium	Fast
Streaming	Yes	Yes	No	Yes
Voice Cloning	No	No	Yes	No
Model Size	Medium	Small	Large	Small
Architecture	Acoustic + Vocoder	End-to-End	Encoder + Decoder + Vocoder	End-to-End

Next Steps

TTS API

Detailed API documentation

Streaming TTS

Low-latency streaming guide

Model Setup

How to download and bundle models

Execution Providers

Hardware acceleration options

Speech-to-Text Models

Text-to-Speech Models

​Matcha Models

​Model Architecture

​When to Use

High-Quality Audio

Audiobook Narration

Content Creation

Expressive Speech

​Supported Languages

​Performance Characteristics

​Download Links

Matcha Models

​Configuration Example

​Basic TTS

​With Model Options

​Streaming TTS

​Save to File

​Model Options

​Tuning Examples

​Runtime Updates

​Model Detection

​Performance Tips

​Optimize Thread Count

​Use Streaming for Long Text

​Hardware Acceleration

​Streaming Support

​Advantages

​Limitations

​Use Cases

Audiobook Narration

Content Production

E-Learning

Podcasts

​Common Issues

​Comparison with Other Models

​Next Steps

TTS API

Streaming TTS

Model Setup

Execution Providers

Build docs developers (and LLMs) love

Matcha Models

Model Architecture

When to Use

Supported Languages

Performance Characteristics

Download Links

Configuration Example

Basic TTS

With Model Options

Streaming TTS

Save to File

Model Options

Tuning Examples

Runtime Updates

Model Detection

Performance Tips

Optimize Thread Count

Use Streaming for Long Text

Hardware Acceleration

Streaming Support

Advantages

Limitations

Use Cases

Common Issues

Comparison with Other Models

Next Steps