Model Architecture
RCLI uses 5 model types working together in a multi-threaded pipeline:STT (Speech-to-Text)
Converts voice input to text using Zipformer (streaming) or Whisper/Parakeet (offline)
LLM (Language Model)
Processes requests and generates responses using Qwen3 or Liquid LFM2 models
TTS (Text-to-Speech)
Synthesizes natural speech output using Piper, Kokoro, or KittenTTS voices
VAD (Voice Activity)
Detects speech vs. silence in real-time using Silero VAD (0.6 MB)
Embeddings
Generates text embeddings for RAG using Snowflake Arctic Embed S (34 MB)
Storage Location
All models are stored in a single directory on your Mac:Total default install size: ~1 GB. Models are downloaded once during
rcli setup and persist across sessions.Default Models
The default model set installed byrcli setup is optimized for speed, quality, and disk efficiency:
| Component | Default Model | Size | Key Feature |
|---|---|---|---|
| LLM | Liquid LFM2 1.2B Tool | 731 MB | Excellent tool calling (~180 t/s) |
| STT (Streaming) | Zipformer | 50 MB | Real-time streaming for live mic |
| STT (Offline) | Whisper base.en | 140 MB | ~5% WER, fast batch transcription |
| TTS | Piper Lessac | 60 MB | Fast synthesis, clear English voice |
| VAD | Silero VAD | 0.6 MB | Real-time speech detection |
| Embeddings | Snowflake Arctic S | 34 MB | High-quality text embeddings for RAG |
Model Inference Engines
RCLI uses optimized inference engines for each model type:- llama.cpp — LLM + embedding inference with Metal GPU, KV caching, Flash Attention
- sherpa-onnx — STT, TTS, and VAD via ONNX Runtime with Metal acceleration
- USearch — HNSW vector index for fast hybrid retrieval (~4ms)
Context Windows
LLM models support different context sizes:| Model Family | Default Context | Max Context |
|---|---|---|
| Qwen3 / Qwen3.5 | 4,096 tokens | 32K - 262K tokens |
| Liquid LFM2 | 4,096 tokens | 128K tokens |
--ctx-size <n> flag.
Model Selection Persistence
Active model choices persist across sessions in:Next Steps
LLM Models
Compare all 9 language models with specs and benchmarks
STT Models
Explore speech-to-text options (Zipformer, Whisper, Parakeet)
TTS Models
Browse 6 TTS voices with quality ratings
Switching Models
Learn how to hot-swap models without restart