Fast neural TTS with
conditional flow matching
Matcha-TTS is a probabilistic, non-autoregressive text-to-speech architecture that achieves natural-sounding synthesis with a compact memory footprint and real-time performance
Quick start
Get started with Matcha-TTS in three simple steps
Install Matcha-TTS
Synthesize your first speech
Why Matcha-TTS?
Built for researchers and developers who need fast, high-quality neural TTS
Blazing fast synthesis
Conditional flow matching enables 10x speedup over diffusion models while maintaining audio quality. Achieve real-time synthesis with RTF < 1 on modern GPUs
Natural, expressive speech
Non-autoregressive architecture with probabilistic modeling produces highly natural speech. Control speaking rate and variance for expressive synthesis
Multi-speaker support
Train on multi-speaker datasets like VCTK with speaker embeddings. Pre-trained multi-speaker models support 108 different voices out of the box
Production-ready deployment
Export models to ONNX for optimized inference across platforms. Embed vocoders in the graph for end-to-end waveform generation in production
Explore the documentation
Everything you need to build with Matcha-TTS
Core concepts
Training guide
Inference API
CLI reference
ONNX deployment
Model API
Resources
Ready to get started?
Install Matcha-TTS and synthesize your first speech in minutes
View quickstart guide