Skip to main content
RCLI supports 6 TTS voices across 4 architectures, ranging from lightweight Piper voices to high-quality Kokoro models with 103 speakers.

Quick Comparison

VoiceArchitectureSizeSpeakersQualityLanguagesLicense
Piper LessacVITS60 MB1GoodEnglishMIT
Piper AmyVITS60 MB1GoodEnglishMIT
Matcha LJSpeechMatcha100 MB1GreatEnglishMIT
KittenTTS NanoKitten90 MB8GreatEnglishApache 2.0
Kokoro English v0.19 🏆Kokoro310 MB11ExcellentEnglishApache 2.0
Kokoro Multi-lang v1.1Kokoro500 MB103ExcellentChinese + EnglishApache 2.0
⭐ = Default (ships with rcli setup)
🏆 = Recommended upgrade for best quality

Voice Details

Piper Lessac (Default)

Recommended for most users. Fast synthesis, clear English, low latency.

Specifications

  • Provider: Rhasspy Piper
  • Architecture: VITS (Variational Inference TTS)
  • Size: ~60 MB
  • Speakers: 1 (male voice)
  • Quality: Good
  • Languages: English (US)
  • Latency: ~150.6 ms average on M3 Max
  • License: MIT
  • Download: rcli setup (default)

Model Files

Stored in ~/Library/RCLI/models/piper-voice/:
piper-voice/
├── en_US-lessac-medium.onnx
├── en_US-lessac-medium.onnx.json
└── tokens.txt

Key Features

  • Fastest TTS synthesis (~150ms latency)
  • Clear, neutral male voice
  • Optimized for command responses
  • Minimal disk footprint (60 MB)
  • Double-buffered playback (next sentence synthesizes while current plays)

When to Use

  • Default choice for most users
  • Fast response times required
  • Limited disk space
  • English-only use cases

Piper Amy

Specifications

  • Provider: Rhasspy Piper
  • Architecture: VITS
  • Size: ~60 MB
  • Speakers: 1 (female voice)
  • Quality: Good
  • Languages: English (US)
  • License: MIT
  • Download: rcli voices

Description

Warm female alternative to Piper Lessac. Same speed and quality, different tone.

KittenTTS Nano

Specifications

  • Provider: KittenML
  • Architecture: Kitten (custom architecture)
  • Size: ~90 MB
  • Speakers: 8 (4 male, 4 female)
  • Quality: Great
  • Languages: English
  • License: Apache 2.0
  • Download: rcli voices

Model Files

Stored in ~/Library/RCLI/models/kitten-nano-en-v0_1-fp16/:
kitten-nano-en-v0_1-fp16/
├── model.fp16.onnx
├── tokens.txt
└── voices.bin

Key Features

  • 8 distinct voices — Choose from 4 male and 4 female speakers
  • Lightweight (90 MB for all 8 voices)
  • Good quality for size
  • Voice ID selectable via config

Speaker Selection

Switch speakers in the TUI or via config:
# ~/.Library/RCLI/config
tts_model=kitten-nano
tts_speaker=3  # Speaker IDs: 0-7

Matcha LJSpeech

Specifications

  • Provider: Matcha-TTS
  • Architecture: Matcha (flow-matching)
  • Size: ~100 MB (model + HiFi-GAN vocoder)
  • Speakers: 1 (female voice)
  • Quality: Great
  • Languages: English
  • License: MIT
  • Download: rcli voices

Model Files

Stored in ~/Library/RCLI/models/matcha-icefall-en_US-ljspeech/:
matcha-icefall-en_US-ljspeech/
├── model-steps-3.onnx
├── hifigan_v2.onnx
└── tokens.txt

Description

Fast synthesis with clear female voice. Uses HiFi-GAN vocoder for high-quality audio generation.
Best English quality. 11 natural-sounding voices with excellent prosody.

Specifications

  • Provider: Hexgrad
  • Architecture: Kokoro (82M parameters)
  • Size: ~310 MB
  • Speakers: 11 (various voices)
  • Quality: Excellent
  • Languages: English
  • License: Apache 2.0
  • Download: rcli voices

Model Files

Stored in ~/Library/RCLI/models/kokoro-en-v0_19/:
kokoro-en-v0_19/
├── model.onnx
├── tokens.txt
└── voices.bin

Key Features

  • 11 unique voices — Wide variety of tones and styles
  • Best prosody and naturalness
  • Excellent for extended conversations
  • Clear pronunciation and intonation
  • Slightly higher latency (~180-200ms) due to quality

When to Use

  • Best audio quality required
  • Extended voice conversations
  • Professional use cases
  • Disk space not a constraint (310 MB)

Kokoro Multi-lang v1.1

Specifications

  • Provider: Hexgrad
  • Architecture: Kokoro (82M parameters)
  • Size: ~500 MB
  • Speakers: 103 (multilingual)
  • Quality: Excellent
  • Languages: Chinese + English
  • License: Apache 2.0
  • Download: rcli voices

Model Files

Stored in ~/Library/RCLI/models/kokoro-multi-lang-v1_1/:
kokoro-multi-lang-v1_1/
├── model.onnx
├── tokens.txt
├── voices.bin
└── lexicon-us-en.txt

Key Features

  • 103 speakers — Largest voice library
  • Chinese and English support
  • Lexicon-based pronunciation (US English)
  • Best for multilingual use cases

When to Use

  • Multilingual conversations (Chinese/English)
  • Need many voice options
  • Maximum voice variety
  • Disk space not a constraint (500 MB)

Quality Comparison

Naturalness & Prosody

VoiceNaturalnessProsodyClarityBest For
Kokoro EnglishExcellentExcellentExcellentExtended conversations
Kokoro Multi-langExcellentExcellentExcellentMultilingual use
KittenTTS NanoGreatGreatGoodVoice variety (8 speakers)
Matcha LJSpeechGreatGoodExcellentClear female voice
Piper AmyGoodGoodGoodWarm female tone
Piper LessacGoodGoodGoodFast, neutral male voice

Latency Benchmarks

Measured on Apple M3 Max:
VoiceAvg LatencySynthesis SpeedReal-time Factor
Piper Lessac150.6 msFast~0.3x
Piper Amy155 msFast~0.3x
KittenTTS Nano165 msFast~0.35x
Matcha LJSpeech170 msFast~0.38x
Kokoro English190 msMedium~0.42x
Kokoro Multi-lang210 msMedium~0.48x
Real-time factor <1.0 means synthesis is faster than playback. All models are fast enough for real-time conversation.

Switching Voices

Interactive Voice Browser

rcli voices  # Browse, download, and switch TTS voices
Use arrow keys to select a voice, press Enter to download/activate.

Check Active Voice

rcli info  # Shows active TTS voice

Manual Configuration

Edit config file directly:
# ~/.Library/RCLI/config
tts_model=kokoro-en      # Voice ID
tts_speaker=5            # Speaker ID (for multi-speaker models)
Changes take effect immediately (no restart required).

Architecture Details

VITS (Piper)

  • Full name: Variational Inference with adversarial learning for end-to-end TTS
  • Approach: End-to-end neural TTS with GAN-based training
  • Strengths: Fast inference, small model size
  • Phonemizer: eSpeak-NG (built-in)

Kokoro

  • Parameters: 82M
  • Approach: Transformer-based TTS with multi-speaker embeddings
  • Strengths: Best naturalness, excellent prosody
  • Phonemizer: eSpeak-NG + lexicon

Kitten

  • Architecture: Custom lightweight TTS
  • Approach: Fast synthesis with speaker embeddings
  • Strengths: 8 voices in one model, lightweight

Matcha

  • Architecture: Flow-matching for fast TTS
  • Vocoder: HiFi-GAN v2
  • Strengths: Fast synthesis, clear output

eSpeak Phonemizer

All TTS models use eSpeak-NG for text-to-phoneme conversion:
  • Size: ~8 MB (eSpeak-NG data)
  • Location: ~/Library/RCLI/models/espeak-ng-data/
  • Languages: 100+ languages supported
  • Shared: All TTS models use the same eSpeak data

Benchmarks

Run TTS benchmarks:
rcli bench --suite tts            # Benchmark active voice
rcli bench --all-tts --suite tts  # Compare all installed voices
Example output:
=== TTS Benchmark ===
Voice: Piper Lessac
  Average latency: 150.6 ms
  Synthesis speed: 3.2x real-time
  Memory usage: 180 MB

Storage Requirements

SetupTotal SizeVoices Installed
Default~68 MBPiper Lessac + eSpeak
+ Kokoro English~378 MBPiper + Kokoro EN
+ Kokoro Multi~878 MBPiper + Both Kokoro
All voices~1.1 GBAll 6 voices + eSpeak
Multiple voices can be installed simultaneously. RCLI selects the active voice from config.

Speaker IDs

KittenTTS Nano (8 speakers)

Speaker IDGenderDescription
0MaleDeep, calm
1MaleNeutral
2MaleWarm
3MaleEnergetic
4FemaleSoft
5FemaleNeutral
6FemaleWarm
7FemaleClear

Kokoro English (11 speakers)

Speaker IDs 0-10 with varied tones and styles. Browse in TUI for previews.

Kokoro Multi-lang (103 speakers)

Speaker IDs 0-102 covering Chinese and English voices. Use TUI to explore.

Next Steps

Voice Browser

Browse and download TTS voices

Benchmarks

Measure TTS performance on your Mac

Model Browser

Manage all models (LLM, STT, TTS)

Switch Models

Hot-swap voices without restart

Build docs developers (and LLMs) love