TTS Models - RCLI

RCLI supports 6 TTS voices across 4 architectures, ranging from lightweight Piper voices to high-quality Kokoro models with 103 speakers.

Quick Comparison

Voice	Architecture	Size	Speakers	Quality	Languages	License
Piper Lessac ⭐	VITS	60 MB	1	Good	English	MIT
Piper Amy	VITS	60 MB	1	Good	English	MIT
Matcha LJSpeech	Matcha	100 MB	1	Great	English	MIT
KittenTTS Nano	Kitten	90 MB	8	Great	English	Apache 2.0
Kokoro English v0.19 🏆	Kokoro	310 MB	11	Excellent	English	Apache 2.0
Kokoro Multi-lang v1.1	Kokoro	500 MB	103	Excellent	Chinese + English	Apache 2.0

⭐ = Default (ships with rcli setup)
🏆 = Recommended upgrade for best quality

Voice Details

Piper Lessac (Default)

Recommended for most users. Fast synthesis, clear English, low latency.

Specifications

Provider: Rhasspy Piper
Architecture: VITS (Variational Inference TTS)
Size: ~60 MB
Speakers: 1 (male voice)
Quality: Good
Languages: English (US)
Latency: ~150.6 ms average on M3 Max
License: MIT
Download: rcli setup (default)

Model Files

Stored in ~/Library/RCLI/models/piper-voice/:

piper-voice/
├── en_US-lessac-medium.onnx
├── en_US-lessac-medium.onnx.json
└── tokens.txt

Key Features

Fastest TTS synthesis (~150ms latency)
Clear, neutral male voice
Optimized for command responses
Minimal disk footprint (60 MB)
Double-buffered playback (next sentence synthesizes while current plays)

When to Use

Default choice for most users
Fast response times required
Limited disk space
English-only use cases

Piper Amy

Specifications

Provider: Rhasspy Piper
Architecture: VITS
Size: ~60 MB
Speakers: 1 (female voice)
Quality: Good
Languages: English (US)
License: MIT
Download: rcli voices

Description

Warm female alternative to Piper Lessac. Same speed and quality, different tone.

KittenTTS Nano

Specifications

Provider: KittenML
Architecture: Kitten (custom architecture)
Size: ~90 MB
Speakers: 8 (4 male, 4 female)
Quality: Great
Languages: English
License: Apache 2.0
Download: rcli voices

Model Files

Stored in ~/Library/RCLI/models/kitten-nano-en-v0_1-fp16/:

kitten-nano-en-v0_1-fp16/
├── model.fp16.onnx
├── tokens.txt
└── voices.bin

Key Features

8 distinct voices — Choose from 4 male and 4 female speakers
Lightweight (90 MB for all 8 voices)
Good quality for size
Voice ID selectable via config

Speaker Selection

Switch speakers in the TUI or via config:

# ~/.Library/RCLI/config
tts_model=kitten-nano
tts_speaker=3  # Speaker IDs: 0-7

Matcha LJSpeech

Specifications

Provider: Matcha-TTS
Architecture: Matcha (flow-matching)
Size: ~100 MB (model + HiFi-GAN vocoder)
Speakers: 1 (female voice)
Quality: Great
Languages: English
License: MIT
Download: rcli voices

Model Files

Stored in ~/Library/RCLI/models/matcha-icefall-en_US-ljspeech/:

matcha-icefall-en_US-ljspeech/
├── model-steps-3.onnx
├── hifigan_v2.onnx
└── tokens.txt

Description

Fast synthesis with clear female voice. Uses HiFi-GAN vocoder for high-quality audio generation.

Kokoro English v0.19 (Recommended)

Best English quality. 11 natural-sounding voices with excellent prosody.

Specifications

Provider: Hexgrad
Architecture: Kokoro (82M parameters)
Size: ~310 MB
Speakers: 11 (various voices)
Quality: Excellent
Languages: English
License: Apache 2.0
Download: rcli voices

Model Files

Stored in ~/Library/RCLI/models/kokoro-en-v0_19/:

kokoro-en-v0_19/
├── model.onnx
├── tokens.txt
└── voices.bin

Key Features

11 unique voices — Wide variety of tones and styles
Best prosody and naturalness
Excellent for extended conversations
Clear pronunciation and intonation
Slightly higher latency (~180-200ms) due to quality

When to Use

Best audio quality required
Extended voice conversations
Professional use cases
Disk space not a constraint (310 MB)

Kokoro Multi-lang v1.1

Specifications

Provider: Hexgrad
Architecture: Kokoro (82M parameters)
Size: ~500 MB
Speakers: 103 (multilingual)
Quality: Excellent
Languages: Chinese + English
License: Apache 2.0
Download: rcli voices

Model Files

Stored in ~/Library/RCLI/models/kokoro-multi-lang-v1_1/:

kokoro-multi-lang-v1_1/
├── model.onnx
├── tokens.txt
├── voices.bin
└── lexicon-us-en.txt

Key Features

103 speakers — Largest voice library
Chinese and English support
Lexicon-based pronunciation (US English)
Best for multilingual use cases

When to Use

Multilingual conversations (Chinese/English)
Need many voice options
Maximum voice variety
Disk space not a constraint (500 MB)

Quality Comparison

Naturalness & Prosody

Voice	Naturalness	Prosody	Clarity	Best For
Kokoro English	Excellent	Excellent	Excellent	Extended conversations
Kokoro Multi-lang	Excellent	Excellent	Excellent	Multilingual use
KittenTTS Nano	Great	Great	Good	Voice variety (8 speakers)
Matcha LJSpeech	Great	Good	Excellent	Clear female voice
Piper Amy	Good	Good	Good	Warm female tone
Piper Lessac	Good	Good	Good	Fast, neutral male voice

Latency Benchmarks

Measured on Apple M3 Max:

Voice	Avg Latency	Synthesis Speed	Real-time Factor
Piper Lessac ⭐	150.6 ms	Fast	~0.3x
Piper Amy	155 ms	Fast	~0.3x
KittenTTS Nano	165 ms	Fast	~0.35x
Matcha LJSpeech	170 ms	Fast	~0.38x
Kokoro English	190 ms	Medium	~0.42x
Kokoro Multi-lang	210 ms	Medium	~0.48x

Real-time factor <1.0 means synthesis is faster than playback. All models are fast enough for real-time conversation.

Switching Voices

Interactive Voice Browser

rcli voices  # Browse, download, and switch TTS voices

Use arrow keys to select a voice, press Enter to download/activate.

Check Active Voice

rcli info  # Shows active TTS voice

Manual Configuration

Edit config file directly:

# ~/.Library/RCLI/config
tts_model=kokoro-en      # Voice ID
tts_speaker=5            # Speaker ID (for multi-speaker models)

Changes take effect immediately (no restart required).

Architecture Details

VITS (Piper)

Full name: Variational Inference with adversarial learning for end-to-end TTS
Approach: End-to-end neural TTS with GAN-based training
Strengths: Fast inference, small model size
Phonemizer: eSpeak-NG (built-in)

Kokoro

Parameters: 82M
Approach: Transformer-based TTS with multi-speaker embeddings
Strengths: Best naturalness, excellent prosody
Phonemizer: eSpeak-NG + lexicon

Kitten

Architecture: Custom lightweight TTS
Approach: Fast synthesis with speaker embeddings
Strengths: 8 voices in one model, lightweight

Matcha

Architecture: Flow-matching for fast TTS
Vocoder: HiFi-GAN v2
Strengths: Fast synthesis, clear output

eSpeak Phonemizer

All TTS models use eSpeak-NG for text-to-phoneme conversion:

Size: ~8 MB (eSpeak-NG data)
Location: ~/Library/RCLI/models/espeak-ng-data/
Languages: 100+ languages supported
Shared: All TTS models use the same eSpeak data

Benchmarks

Run TTS benchmarks:

rcli bench --suite tts            # Benchmark active voice
rcli bench --all-tts --suite tts  # Compare all installed voices

Example output:

=== TTS Benchmark ===
Voice: Piper Lessac
  Average latency: 150.6 ms
  Synthesis speed: 3.2x real-time
  Memory usage: 180 MB

Storage Requirements

Setup	Total Size	Voices Installed
Default	~68 MB	Piper Lessac + eSpeak
+ Kokoro English	~378 MB	Piper + Kokoro EN
+ Kokoro Multi	~878 MB	Piper + Both Kokoro
All voices	~1.1 GB	All 6 voices + eSpeak

Multiple voices can be installed simultaneously. RCLI selects the active voice from config.

Speaker IDs

KittenTTS Nano (8 speakers)

Speaker ID	Gender	Description
0	Male	Deep, calm
1	Male	Neutral
2	Male	Warm
3	Male	Energetic
4	Female	Soft
5	Female	Neutral
6	Female	Warm
7	Female	Clear

Kokoro English (11 speakers)

Speaker IDs 0-10 with varied tones and styles. Browse in TUI for previews.

Kokoro Multi-lang (103 speakers)

Speaker IDs 0-102 covering Chinese and English voices. Use TUI to explore.

Next Steps

Voice Browser

Browse and download TTS voices

Benchmarks

Measure TTS performance on your Mac

Model Browser

Manage all models (LLM, STT, TTS)

Switch Models

Hot-swap voices without restart

Get Started

Core Features

Commands

Models

Actions

Advanced

Development

​Quick Comparison

​Voice Details

​Piper Lessac (Default)

​Specifications

​Model Files

​Key Features

​When to Use

​Piper Amy

​Specifications

​Description

​KittenTTS Nano

​Specifications

​Model Files

​Key Features

​Speaker Selection

​Matcha LJSpeech

​Specifications

​Model Files

​Description

​Kokoro English v0.19 (Recommended)

​Specifications

​Model Files

​Key Features

​When to Use

​Kokoro Multi-lang v1.1

​Specifications

​Model Files

​Key Features

​When to Use

​Quality Comparison

​Naturalness & Prosody

​Latency Benchmarks

​Switching Voices

​Interactive Voice Browser

​Check Active Voice

​Manual Configuration

​Architecture Details

​VITS (Piper)

​Kokoro

​Kitten

​Matcha

​eSpeak Phonemizer

​Benchmarks

​Storage Requirements

​Speaker IDs

​KittenTTS Nano (8 speakers)

​Kokoro English (11 speakers)

​Kokoro Multi-lang (103 speakers)

​Next Steps

Voice Browser

Benchmarks

Model Browser

Switch Models

Build docs developers (and LLMs) love

Quick Comparison

Voice Details

Piper Lessac (Default)

Specifications

Model Files

Key Features

When to Use

Piper Amy

Specifications

Description

KittenTTS Nano

Specifications

Model Files

Key Features

Speaker Selection

Matcha LJSpeech

Specifications

Model Files

Description

Kokoro English v0.19 (Recommended)

Specifications

Model Files

Key Features

When to Use

Kokoro Multi-lang v1.1

Specifications

Model Files

Key Features

When to Use

Quality Comparison

Naturalness & Prosody

Latency Benchmarks

Switching Voices

Interactive Voice Browser

Check Active Voice

Manual Configuration

Architecture Details

VITS (Piper)

Kokoro

Kitten

Matcha

eSpeak Phonemizer

Benchmarks

Storage Requirements

Speaker IDs

KittenTTS Nano (8 speakers)

Kokoro English (11 speakers)

Kokoro Multi-lang (103 speakers)

Next Steps