ICASSP 2024

Fast neural TTS with
conditional flow matching

Matcha-TTS is a probabilistic, non-autoregressive text-to-speech architecture that achieves natural-sounding synthesis with a compact memory footprint and real-time performance

Get started Read the docs Demo page

1250+

GitHub Stars

10x

Faster than diffusion

RTF < 1

Real-time synthesis

ONNX

Production ready

Quick start

Get started with Matcha-TTS in three simple steps

Install Matcha-TTS

Install the package using pip with Python 3.10 or higher:

pip install matcha-tts

Or install from source for the latest development version:

pip install git+https://github.com/shivammehta25/Matcha-TTS.git

Synthesize your first speech

Use the CLI to generate speech from text. Pre-trained models will be downloaded automatically:

matcha-tts --text "Welcome to Matcha TTS, a fast text to speech system"

This command downloads the pre-trained LJSpeech model (~100MB) and generates a waveform file in your current directory

Try the interactive interface

Launch the Gradio web interface for interactive synthesis:

matcha-tts-app

The app lets you adjust synthesis parameters like speaking rate, temperature, and ODE solver steps in real-time

Why Matcha-TTS?

Built for researchers and developers who need fast, high-quality neural TTS

Blazing fast synthesis

Conditional flow matching enables 10x speedup over diffusion models while maintaining audio quality. Achieve real-time synthesis with RTF < 1 on modern GPUs

Natural, expressive speech

Non-autoregressive architecture with probabilistic modeling produces highly natural speech. Control speaking rate and variance for expressive synthesis

Multi-speaker support

Train on multi-speaker datasets like VCTK with speaker embeddings. Pre-trained multi-speaker models support 108 different voices out of the box

Production-ready deployment

Export models to ONNX for optimized inference across platforms. Embed vocoders in the graph for end-to-end waveform generation in production

Explore the documentation

Everything you need to build with Matcha-TTS

Core concepts

Learn about the architecture, flow matching, and how Matcha-TTS works under the hood

Training guide

Train Matcha-TTS on your own dataset with custom voices and languages

Inference API

Integrate Matcha-TTS into your Python applications for programmatic synthesis

CLI reference

Complete command-line reference for synthesis, training, and utilities

ONNX deployment

Export and deploy models with ONNX for production workloads

Model API

Python API reference for the MatchaTTS model and components

Ready to get started?

Install Matcha-TTS and synthesize your first speech in minutes

View quickstart guide

Build docs developers (and LLMs) love

Get started for free Talk to us

Get Started

Core Concepts

Training

Inference

Advanced

Fast neural TTS with
conditional flow matching

Quick start

Why Matcha-TTS?

Blazing fast synthesis

Natural, expressive speech

Multi-speaker support

Production-ready deployment

Explore the documentation

Core concepts

Training guide

Inference API

CLI reference

ONNX deployment

Model API

Resources

Research paper

GitHub repository

Try in browser

Ready to get started?

Build docs developers (and LLMs) love

Get Started

Core Concepts

Training

Inference

Advanced

Fast neural TTS withconditional flow matching

Quick start

Why Matcha-TTS?

Blazing fast synthesis

Natural, expressive speech

Multi-speaker support

Production-ready deployment

Explore the documentation

Core concepts

Training guide

Inference API

CLI reference

ONNX deployment

Model API

Resources

Research paper

GitHub repository

Try in browser

Ready to get started?

Build docs developers (and LLMs) love

Fast neural TTS with
conditional flow matching