Quick Comparison
| Voice | Architecture | Size | Speakers | Quality | Languages | License |
|---|---|---|---|---|---|---|
| Piper Lessac ⭐ | VITS | 60 MB | 1 | Good | English | MIT |
| Piper Amy | VITS | 60 MB | 1 | Good | English | MIT |
| Matcha LJSpeech | Matcha | 100 MB | 1 | Great | English | MIT |
| KittenTTS Nano | Kitten | 90 MB | 8 | Great | English | Apache 2.0 |
| Kokoro English v0.19 🏆 | Kokoro | 310 MB | 11 | Excellent | English | Apache 2.0 |
| Kokoro Multi-lang v1.1 | Kokoro | 500 MB | 103 | Excellent | Chinese + English | Apache 2.0 |
⭐ = Default (ships with
🏆 = Recommended upgrade for best quality
rcli setup)🏆 = Recommended upgrade for best quality
Voice Details
Piper Lessac (Default)
Recommended for most users. Fast synthesis, clear English, low latency.
Specifications
- Provider: Rhasspy Piper
- Architecture: VITS (Variational Inference TTS)
- Size: ~60 MB
- Speakers: 1 (male voice)
- Quality: Good
- Languages: English (US)
- Latency: ~150.6 ms average on M3 Max
- License: MIT
- Download:
rcli setup(default)
Model Files
Stored in~/Library/RCLI/models/piper-voice/:
Key Features
- Fastest TTS synthesis (~150ms latency)
- Clear, neutral male voice
- Optimized for command responses
- Minimal disk footprint (60 MB)
- Double-buffered playback (next sentence synthesizes while current plays)
When to Use
- Default choice for most users
- Fast response times required
- Limited disk space
- English-only use cases
Piper Amy
Specifications
- Provider: Rhasspy Piper
- Architecture: VITS
- Size: ~60 MB
- Speakers: 1 (female voice)
- Quality: Good
- Languages: English (US)
- License: MIT
- Download:
rcli voices
Description
Warm female alternative to Piper Lessac. Same speed and quality, different tone.KittenTTS Nano
Specifications
- Provider: KittenML
- Architecture: Kitten (custom architecture)
- Size: ~90 MB
- Speakers: 8 (4 male, 4 female)
- Quality: Great
- Languages: English
- License: Apache 2.0
- Download:
rcli voices
Model Files
Stored in~/Library/RCLI/models/kitten-nano-en-v0_1-fp16/:
Key Features
- 8 distinct voices — Choose from 4 male and 4 female speakers
- Lightweight (90 MB for all 8 voices)
- Good quality for size
- Voice ID selectable via config
Speaker Selection
Switch speakers in the TUI or via config:Matcha LJSpeech
Specifications
- Provider: Matcha-TTS
- Architecture: Matcha (flow-matching)
- Size: ~100 MB (model + HiFi-GAN vocoder)
- Speakers: 1 (female voice)
- Quality: Great
- Languages: English
- License: MIT
- Download:
rcli voices
Model Files
Stored in~/Library/RCLI/models/matcha-icefall-en_US-ljspeech/:
Description
Fast synthesis with clear female voice. Uses HiFi-GAN vocoder for high-quality audio generation.Kokoro English v0.19 (Recommended)
Best English quality. 11 natural-sounding voices with excellent prosody.
Specifications
- Provider: Hexgrad
- Architecture: Kokoro (82M parameters)
- Size: ~310 MB
- Speakers: 11 (various voices)
- Quality: Excellent
- Languages: English
- License: Apache 2.0
- Download:
rcli voices
Model Files
Stored in~/Library/RCLI/models/kokoro-en-v0_19/:
Key Features
- 11 unique voices — Wide variety of tones and styles
- Best prosody and naturalness
- Excellent for extended conversations
- Clear pronunciation and intonation
- Slightly higher latency (~180-200ms) due to quality
When to Use
- Best audio quality required
- Extended voice conversations
- Professional use cases
- Disk space not a constraint (310 MB)
Kokoro Multi-lang v1.1
Specifications
- Provider: Hexgrad
- Architecture: Kokoro (82M parameters)
- Size: ~500 MB
- Speakers: 103 (multilingual)
- Quality: Excellent
- Languages: Chinese + English
- License: Apache 2.0
- Download:
rcli voices
Model Files
Stored in~/Library/RCLI/models/kokoro-multi-lang-v1_1/:
Key Features
- 103 speakers — Largest voice library
- Chinese and English support
- Lexicon-based pronunciation (US English)
- Best for multilingual use cases
When to Use
- Multilingual conversations (Chinese/English)
- Need many voice options
- Maximum voice variety
- Disk space not a constraint (500 MB)
Quality Comparison
Naturalness & Prosody
| Voice | Naturalness | Prosody | Clarity | Best For |
|---|---|---|---|---|
| Kokoro English | Excellent | Excellent | Excellent | Extended conversations |
| Kokoro Multi-lang | Excellent | Excellent | Excellent | Multilingual use |
| KittenTTS Nano | Great | Great | Good | Voice variety (8 speakers) |
| Matcha LJSpeech | Great | Good | Excellent | Clear female voice |
| Piper Amy | Good | Good | Good | Warm female tone |
| Piper Lessac | Good | Good | Good | Fast, neutral male voice |
Latency Benchmarks
Measured on Apple M3 Max:| Voice | Avg Latency | Synthesis Speed | Real-time Factor |
|---|---|---|---|
| Piper Lessac ⭐ | 150.6 ms | Fast | ~0.3x |
| Piper Amy | 155 ms | Fast | ~0.3x |
| KittenTTS Nano | 165 ms | Fast | ~0.35x |
| Matcha LJSpeech | 170 ms | Fast | ~0.38x |
| Kokoro English | 190 ms | Medium | ~0.42x |
| Kokoro Multi-lang | 210 ms | Medium | ~0.48x |
Real-time factor <1.0 means synthesis is faster than playback. All models are fast enough for real-time conversation.
Switching Voices
Interactive Voice Browser
Check Active Voice
Manual Configuration
Edit config file directly:Architecture Details
VITS (Piper)
- Full name: Variational Inference with adversarial learning for end-to-end TTS
- Approach: End-to-end neural TTS with GAN-based training
- Strengths: Fast inference, small model size
- Phonemizer: eSpeak-NG (built-in)
Kokoro
- Parameters: 82M
- Approach: Transformer-based TTS with multi-speaker embeddings
- Strengths: Best naturalness, excellent prosody
- Phonemizer: eSpeak-NG + lexicon
Kitten
- Architecture: Custom lightweight TTS
- Approach: Fast synthesis with speaker embeddings
- Strengths: 8 voices in one model, lightweight
Matcha
- Architecture: Flow-matching for fast TTS
- Vocoder: HiFi-GAN v2
- Strengths: Fast synthesis, clear output
eSpeak Phonemizer
All TTS models use eSpeak-NG for text-to-phoneme conversion:- Size: ~8 MB (eSpeak-NG data)
- Location:
~/Library/RCLI/models/espeak-ng-data/ - Languages: 100+ languages supported
- Shared: All TTS models use the same eSpeak data
Benchmarks
Run TTS benchmarks:Storage Requirements
| Setup | Total Size | Voices Installed |
|---|---|---|
| Default | ~68 MB | Piper Lessac + eSpeak |
| + Kokoro English | ~378 MB | Piper + Kokoro EN |
| + Kokoro Multi | ~878 MB | Piper + Both Kokoro |
| All voices | ~1.1 GB | All 6 voices + eSpeak |
Multiple voices can be installed simultaneously. RCLI selects the active voice from config.
Speaker IDs
KittenTTS Nano (8 speakers)
| Speaker ID | Gender | Description |
|---|---|---|
| 0 | Male | Deep, calm |
| 1 | Male | Neutral |
| 2 | Male | Warm |
| 3 | Male | Energetic |
| 4 | Female | Soft |
| 5 | Female | Neutral |
| 6 | Female | Warm |
| 7 | Female | Clear |
Kokoro English (11 speakers)
Speaker IDs 0-10 with varied tones and styles. Browse in TUI for previews.Kokoro Multi-lang (103 speakers)
Speaker IDs 0-102 covering Chinese and English voices. Use TUI to explore.Next Steps
Voice Browser
Browse and download TTS voices
Benchmarks
Measure TTS performance on your Mac
Model Browser
Manage all models (LLM, STT, TTS)
Switch Models
Hot-swap voices without restart