Command Line Interface

Overview

Matcha-TTS provides several command-line tools for text-to-speech synthesis, data preprocessing, and model utilities. All commands are available after installing the package.

Installation

pip install matcha-tts

Commands

matcha-tts

Main command for text-to-speech synthesis.

matcha-tts [OPTIONS]

Options

--model

str

default:"matcha_ljspeech"

Model to use for synthesisChoices: matcha_ljspeech, matcha_vctk

--checkpoint_path

str

default:"None"

Path to a custom model checkpoint (overrides —model)

--vocoder

str

default:"None"

Vocoder to use (defaults to the recommended vocoder for the model)Choices: hifigan_T2_v1, hifigan_univ_v1

--text

str

default:"None"

Text to synthesize (single utterance)

--file

str

default:"None"

Path to text file with utterances (one per line)

--spk

int

default:"None"

Speaker ID for multi-speaker models (0-107 for matcha_vctk)

--temperature

float

default:"0.667"

Variance of the noise. Higher = more diverse output. Range: 0.0-2.0

--speaking_rate

float

default:"None"

Speaking rate control. Higher = slower speech. Default: 1.0 for custom models, 0.95 for ljspeech, 0.85 for vctk

--steps

int

default:"10"

Number of ODE solver steps. More steps = better quality but slower. Typical range: 4-20

--cpu

flag

Force CPU inference (default: use GPU if available)

--denoiser_strength

float

default:"0.00025"

Strength of the vocoder bias denoiser

--output_folder

str

default:"."

Output folder to save synthesized audio and mel-spectrograms

--batched

flag

Enable batched inference for processing multiple utterances

--batch_size

int

default:"32"

Batch size for batched inference

Examples

Single speaker synthesis:

matcha-tts --text "Hello, this is a test of Matcha TTS." \
  --model matcha_ljspeech \
  --steps 10 \
  --output_folder ./outputs

Multi-speaker synthesis:

matcha-tts --text "Hello, how are you today?" \
  --model matcha_vctk \
  --spk 5 \
  --temperature 0.667 \
  --steps 10

Batch synthesis from file:

matcha-tts --file utterances.txt \
  --model matcha_ljspeech \
  --batched \
  --batch_size 16 \
  --output_folder ./batch_outputs

Custom model with specific settings:

matcha-tts --checkpoint_path ./my_model.ckpt \
  --text "Testing custom model." \
  --vocoder hifigan_univ_v1 \
  --speaking_rate 0.9 \
  --temperature 0.5 \
  --steps 15

CPU inference:

matcha-tts --text "Running on CPU." \
  --model matcha_ljspeech \
  --cpu

matcha-data-stats

Compute mel-spectrogram statistics for dataset normalization.

matcha-data-stats [OPTIONS]

Options

-i, --input-config

str

default:"vctk.yaml"

Name of the YAML config file under configs/data/

-b, --batch-size

int

default:"256"

Batch size for computation (higher = faster)

-f, --force

flag

Force overwrite existing output file

Output

Creates a JSON file with mel-spectrogram statistics:

{
  "mel_mean": -5.5345,
  "mel_std": 2.1234
}

Example

matcha-data-stats -i ljspeech.yaml -b 512

matcha-tts-get-durations

Extract phoneme durations from a trained model using Monotonic Alignment Search.

matcha-tts-get-durations [OPTIONS]

Options

-i, --input-config

str

default:"ljspeech.yaml"

Name of the YAML config file under configs/data/

-c, --checkpoint_path

str

required

Path to the trained model checkpoint

-b, --batch-size

int

default:"32"

Batch size for processing

-o, --output-folder

str

default:"None"

Output folder for durations (defaults to data_path/durations/)

-f, --force

flag

Force overwrite existing duration files

--cpu

flag

Use CPU instead of GPU (not recommended)

Output

For each audio file, generates:

filename.npy: NumPy array of phoneme durations
filename.json: JSON with phoneme-duration pairs

Example

matcha-tts-get-durations \
  -i ljspeech.yaml \
  -c checkpoints/best_model.ckpt \
  -o ./durations \
  -b 64

matcha-tts-app

Launch interactive Gradio web interface for synthesis.

matcha-tts-app

This command starts a web interface where you can:

Select pre-trained models
Choose speakers (for multi-speaker models)
Adjust synthesis parameters
Type or paste text
Generate and play audio

Example

matcha-tts-app
# Opens web interface at http://localhost:7860

Output Files

Audio Files

Generated as utterance_XXX_speaker_YYY.wav (or utterance_XXX.wav for single-speaker):

Format: WAV
Sample rate: 22050 Hz
Bit depth: 24-bit PCM

Mel-Spectrogram Files

Saved alongside audio as:

utterance_XXX.npy: NumPy array of mel-spectrogram
utterance_XXX.png: Visualization of mel-spectrogram

Environment Variables

Matcha-TTS uses the following directories:

Model cache: ~/.local/share/matcha_tts/ (Linux/Mac) or %LOCALAPPDATA%\matcha_tts\ (Windows)
Downloaded models are cached here automatically

Performance Tips

Speed Optimization

Reduce ODE steps: Use --steps 4 for faster synthesis (slight quality loss)
Enable batched inference: Use --batched for multiple utterances
GPU acceleration: Ensure CUDA is available (much faster than CPU)

Quality Optimization

Increase ODE steps: Use --steps 15-20 for better quality
Adjust temperature: Lower values (0.3-0.5) for more consistent output
Fine-tune speaking rate: Adjust --speaking_rate for natural pacing

Pretrained Models

matcha_ljspeech

Dataset: LJ Speech (single female speaker)
Recommended vocoder: hifigan_T2_v1
Default speaking rate: 0.95
Language: English

matcha_vctk

Dataset: VCTK (108 speakers)
Recommended vocoder: hifigan_univ_v1
Default speaking rate: 0.85
Speaker IDs: 0-107
Language: English (various accents)

Error Handling

Common Issues

“Either text or file must be provided”

# Must specify --text OR --file
matcha-tts --text "Hello world"

“Sampling temperature cannot be negative”

# Temperature must be >= 0
matcha-tts --text "Test" --temperature 0.667

“Speaker ID must be between 0 and 107”

# For matcha_vctk, use valid speaker ID
matcha-tts --model matcha_vctk --spk 5 --text "Test"

Source Reference

CLI Implementation: matcha/cli.py:208 Entry Points: setup.py:44

Models

CLI Commands

Utilities

Command Line Interface

Overview

Installation

Commands

matcha-tts

Options

Examples

matcha-data-stats

Options

Output

Example

matcha-tts-get-durations

Options

Output

Example

matcha-tts-app

Example

Output Files

Audio Files

Mel-Spectrogram Files

Environment Variables

Performance Tips

Speed Optimization

Quality Optimization

Pretrained Models

matcha_ljspeech

matcha_vctk

Error Handling

Common Issues

Source Reference

Build docs developers (and LLMs) love

Models

CLI Commands

Utilities

​Overview

​Installation

​Commands

​matcha-tts

​Options

​Examples

​matcha-data-stats

​Options

​Output

​Example

​matcha-tts-get-durations

​Options

​Output

​Example

​matcha-tts-app

​Example

​Output Files

​Audio Files

​Mel-Spectrogram Files

​Environment Variables

​Performance Tips

​Speed Optimization

​Quality Optimization

​Pretrained Models

​matcha_ljspeech

​matcha_vctk

​Error Handling

​Common Issues

​Source Reference

Build docs developers (and LLMs) love

Overview

Installation

Commands

matcha-tts

Options

Examples

matcha-data-stats

Options

Output

Example

matcha-tts-get-durations

Options

Output

Example

matcha-tts-app

Example

Output Files

Audio Files

Mel-Spectrogram Files

Environment Variables

Performance Tips

Speed Optimization

Quality Optimization

Pretrained Models

matcha_ljspeech

matcha_vctk

Error Handling

Common Issues

Source Reference