Overview
Matcha-TTS provides pre-trained models for both single-speaker and multi-speaker text-to-speech synthesis. Models are automatically downloaded when using the CLI or can be manually downloaded from the release page.Pre-trained models are automatically downloaded to your user data directory when first used with the CLI or Gradio interface.
Available Models
Single-Speaker Models
LJ Speech Model
Trained on the LJ Speech dataset (single female speaker): Model Details:- Name:
matcha_ljspeech - Download URL:
https://github.com/shivammehta25/Matcha-TTS-checkpoints/releases/download/v1.0/matcha_ljspeech.ckpt - Recommended Vocoder:
hifigan_T2_v1 - Recommended Speaking Rate:
0.95 - Dataset: LJ Speech (single speaker, ~24 hours)
Multi-Speaker Models
VCTK Model
Trained on the VCTK dataset (108 speakers): Model Details:- Name:
matcha_vctk - Download URL:
https://github.com/shivammehta25/Matcha-TTS-checkpoints/releases/download/v1.0/matcha_vctk.ckpt - Recommended Vocoder:
hifigan_univ_v1 - Recommended Speaking Rate:
0.85 - Speaker Range:
0-107(108 total speakers) - Dataset: VCTK (108 speakers, various accents)
Vocoder Models
Matcha-TTS uses HiFi-GAN vocoders to convert mel-spectrograms to waveforms:HiFi-GAN T2 v1
Optimized for LJ Speech:- Name:
hifigan_T2_v1 - Download URL:
https://github.com/shivammehta25/Matcha-TTS-checkpoints/releases/download/v1.0/generator_v1 - Use Case: Single-speaker LJ Speech model
- Description: Trained specifically on LJ Speech for optimal quality
HiFi-GAN Universal v1
Universal multi-speaker vocoder:- Name:
hifigan_univ_v1 - Download URL:
https://github.com/shivammehta25/Matcha-TTS-checkpoints/releases/download/v1.0/g_02500000 - Use Case: Multi-speaker models and general purpose
- Description: Works across different speakers and datasets
Model Configuration Reference
From matcha/cli.py:20-34:Using Pre-trained Models
Automatic Download
The easiest way is to let the CLI automatically download models:~/.local/share/matcha-tts/ on Linux).
Manual Download
You can manually download models from the releases page:Using Custom Checkpoints
Load a specific checkpoint file:Model Recommendations
For Single Voice
Use LJ Speech model:For Multiple Voices
Use VCTK model:For Custom Datasets
Train your own model:Model Performance
LJ Speech Model
- Quality: High quality, natural-sounding female voice
- Speed: RTF (Real-Time Factor) typically < 0.1 on GPU
- Use Cases: Audiobooks, assistants, narration
- Recommended Steps: 10 for best quality, 5 for faster synthesis
VCTK Model
- Quality: Natural voices across 108 speakers
- Speed: RTF typically < 0.15 on GPU
- Use Cases: Multi-voice applications, character voices, diverse accents
- Recommended Steps: 10 for best quality, 5 for faster synthesis
Synthesis Parameters
Temperature
Controls synthesis variation (default: 0.667):Speaking Rate
Model-specific recommendations:ODE Steps
Number of diffusion steps (default: 10):ONNX Export of Pre-trained Models
Export LJ Speech Model
Export VCTK Model
Model Storage Locations
Pre-trained models are stored in the user data directory: Linux:Gradio Interface
Use pre-trained models with the web interface:- Downloads required models
- Provides speaker selection for VCTK
- Allows parameter adjustment
- Enables audio playback and download
HuggingFace Demo
Try pre-trained models in your browser: Matcha-TTS on HuggingFace Spaces No installation required!Model Validation
The CLI automatically validates models (matcha/cli.py:71-81):- Checks if model exists locally
- Downloads if missing
- Verifies checkpoint integrity
- Selects appropriate vocoder
- Validates speaker IDs for multi-speaker models
Troubleshooting
Download Fails
Issue: Model download times out or fails Solutions:- Check internet connection
- Try manual download from GitHub releases
- Place downloaded files in the user data directory
Wrong Vocoder
Warning:Model Not Found
Error: Model checkpoint not found Solution:Citation
If you use these pre-trained models, please cite the Matcha-TTS paper:Next Steps
Multi-Speaker Setup
Learn to use the VCTK multi-speaker model
ONNX Export
Export pre-trained models to ONNX format
Training
Train your own custom models
ONNX Inference
Deploy models with ONNX Runtime