Skip to main content
ICASSP 2024

Fast neural TTS with
conditional flow matching

Matcha-TTS is a probabilistic, non-autoregressive text-to-speech architecture that achieves natural-sounding synthesis with a compact memory footprint and real-time performance

1250+
GitHub Stars
10x
Faster than diffusion
RTF < 1
Real-time synthesis
ONNX
Production ready

Quick start

Get started with Matcha-TTS in three simple steps

1

Install Matcha-TTS

Install the package using pip with Python 3.10 or higher:
pip install matcha-tts
Or install from source for the latest development version:
pip install git+https://github.com/shivammehta25/Matcha-TTS.git
2

Synthesize your first speech

Use the CLI to generate speech from text. Pre-trained models will be downloaded automatically:
matcha-tts --text "Welcome to Matcha TTS, a fast text to speech system"
This command downloads the pre-trained LJSpeech model (~100MB) and generates a waveform file in your current directory
3

Try the interactive interface

Launch the Gradio web interface for interactive synthesis:
matcha-tts-app
The app lets you adjust synthesis parameters like speaking rate, temperature, and ODE solver steps in real-time

Why Matcha-TTS?

Built for researchers and developers who need fast, high-quality neural TTS

Blazing fast synthesis

Conditional flow matching enables 10x speedup over diffusion models while maintaining audio quality. Achieve real-time synthesis with RTF < 1 on modern GPUs

Natural, expressive speech

Non-autoregressive architecture with probabilistic modeling produces highly natural speech. Control speaking rate and variance for expressive synthesis

Multi-speaker support

Train on multi-speaker datasets like VCTK with speaker embeddings. Pre-trained multi-speaker models support 108 different voices out of the box

Production-ready deployment

Export models to ONNX for optimized inference across platforms. Embed vocoders in the graph for end-to-end waveform generation in production

Explore the documentation

Everything you need to build with Matcha-TTS

Core concepts

Learn about the architecture, flow matching, and how Matcha-TTS works under the hood

Training guide

Train Matcha-TTS on your own dataset with custom voices and languages

Inference API

Integrate Matcha-TTS into your Python applications for programmatic synthesis

CLI reference

Complete command-line reference for synthesis, training, and utilities

ONNX deployment

Export and deploy models with ONNX for production workloads

Model API

Python API reference for the MatchaTTS model and components

Ready to get started?

Install Matcha-TTS and synthesize your first speech in minutes

View quickstart guide

Build docs developers (and LLMs) love