Skip to main content

Installation

Matcha-TTS can be installed in multiple ways depending on your needs. Choose the method that works best for your workflow.

Requirements

Matcha-TTS requires:
  • Python: 3.9 or higher (Python 3.10 recommended)
  • PyTorch: 2.0 or higher
  • CUDA: Optional, but recommended for GPU acceleration
For optimal performance, we recommend using a GPU with CUDA support. Matcha-TTS can run on CPU, but synthesis will be slower.

Quick Install with pip

The fastest way to get started is to install Matcha-TTS directly from PyPI:
pip install matcha-tts
This will install Matcha-TTS and all its dependencies. Pre-trained models will be automatically downloaded when you first use the CLI or Gradio interface.
Make sure you have PyTorch installed before installing Matcha-TTS. If you need GPU support, install PyTorch with CUDA support first.

Installation Methods

Create an isolated environment to avoid dependency conflicts:
1

Create conda environment

conda create -n matcha-tts python=3.10 -y
conda activate matcha-tts
2

Install Matcha-TTS

pip install matcha-tts
3

Verify installation

matcha-tts --help

Key Dependencies

Matcha-TTS relies on several important libraries:
torch>=2.0.0
torchvision>=0.15.0
lightning>=2.0.0
torchmetrics>=0.11.4
torchaudio

ONNX Support (Optional)

For ONNX export and inference:
1

Install ONNX

pip install onnx
2

Install ONNX Runtime

For CPU inference:
pip install onnxruntime
For GPU inference:
pip install onnxruntime-gpu
ONNX export requires PyTorch >= 2.1.0 because the scaled_product_attention operator is not exportable in older versions.

Verify Installation

After installation, verify that Matcha-TTS is working correctly:

Check CLI Access

matcha-tts --help
You should see the help message with available options.

Test Basic Synthesis

matcha-tts --text "Hello, this is a test of Matcha TTS."
This will:
  1. Download pre-trained models (on first run)
  2. Synthesize the text
  3. Save the output as utterance_001.wav in the current directory

Available Commands

After installation, you’ll have access to these commands:
# Main CLI for speech synthesis
matcha-tts --text "Your text here"

Pre-trained Models

Pre-trained models are automatically downloaded to:
  • Linux/Mac: ~/.local/share/matcha_tts/
  • Windows: %LOCALAPPDATA%\matcha_tts\
Available pre-trained models:
  • matcha_ljspeech.ckpt: Single-speaker model (LJSpeech dataset)
  • matcha_vctk.ckpt: Multi-speaker model (VCTK dataset, 108 speakers)
  • hifigan_T2_v1: Vocoder for LJSpeech
  • hifigan_univ_v1: Universal vocoder for VCTK
Models are downloaded automatically on first use. The initial run may take a few minutes to download models (~400MB total).

GPU Setup

For GPU acceleration, ensure CUDA is properly installed:
# Check if PyTorch detects CUDA
python -c "import torch; print(torch.cuda.is_available())"
If this returns False, you may need to reinstall PyTorch with CUDA support. Visit pytorch.org for platform-specific installation instructions.

Troubleshooting

If you encounter Cython compilation errors during installation:
pip install Cython numpy
pip install matcha-tts
Make sure you have a C compiler installed (gcc on Linux, Xcode on Mac, Visual Studio on Windows).
The phonemizer package requires espeak-ng to be installed on your system:Ubuntu/Debian:
sudo apt-get install espeak-ng
macOS:
brew install espeak-ng
Windows: Download and install from espeak-ng releases
If you encounter CUDA out of memory errors:
  1. Reduce batch size (for training)
  2. Use the minimal memory configuration: python matcha/train.py experiment=ljspeech_min_memory
  3. Use CPU mode: matcha-tts --text "..." --cpu
If you get import errors after installation:
  1. Make sure your conda environment is activated
  2. Try reinstalling: pip install --force-reinstall matcha-tts
  3. Check for conflicting package versions

Next Steps

Quick Start Guide

Learn how to use Matcha-TTS for speech synthesis with the CLI, Python API, and Gradio interface

Build docs developers (and LLMs) love