Skip to main content
RCLI runs a complete AI pipeline entirely on your Mac using Apple Silicon with Metal GPU acceleration. All models are stored locally and run without any cloud dependency.

Model Architecture

RCLI uses 5 model types working together in a multi-threaded pipeline:

STT (Speech-to-Text)

Converts voice input to text using Zipformer (streaming) or Whisper/Parakeet (offline)

LLM (Language Model)

Processes requests and generates responses using Qwen3 or Liquid LFM2 models

TTS (Text-to-Speech)

Synthesizes natural speech output using Piper, Kokoro, or KittenTTS voices

VAD (Voice Activity)

Detects speech vs. silence in real-time using Silero VAD (0.6 MB)

Embeddings

Generates text embeddings for RAG using Snowflake Arctic Embed S (34 MB)

Storage Location

All models are stored in a single directory on your Mac:
~/Library/RCLI/models/
├── lfm2-1.2b-tool-q4_k_m.gguf          # LLM (731 MB)
├── zipformer/                           # Streaming STT (~50 MB)
├── whisper-base.en/                     # Offline STT (~140 MB)
├── piper-voice/                         # TTS voice (~60 MB)
├── silero_vad.onnx                      # VAD (0.6 MB)
└── snowflake-arctic-embed-s-q8_0.gguf  # Embeddings (34 MB)
Total default install size: ~1 GB. Models are downloaded once during rcli setup and persist across sessions.

Default Models

The default model set installed by rcli setup is optimized for speed, quality, and disk efficiency:
ComponentDefault ModelSizeKey Feature
LLMLiquid LFM2 1.2B Tool731 MBExcellent tool calling (~180 t/s)
STT (Streaming)Zipformer50 MBReal-time streaming for live mic
STT (Offline)Whisper base.en140 MB~5% WER, fast batch transcription
TTSPiper Lessac60 MBFast synthesis, clear English voice
VADSilero VAD0.6 MBReal-time speech detection
EmbeddingsSnowflake Arctic S34 MBHigh-quality text embeddings for RAG

Model Inference Engines

RCLI uses optimized inference engines for each model type:
  • llama.cpp — LLM + embedding inference with Metal GPU, KV caching, Flash Attention
  • sherpa-onnx — STT, TTS, and VAD via ONNX Runtime with Metal acceleration
  • USearch — HNSW vector index for fast hybrid retrieval (~4ms)

Context Windows

LLM models support different context sizes:
Model FamilyDefault ContextMax Context
Qwen3 / Qwen3.54,096 tokens32K - 262K tokens
Liquid LFM24,096 tokens128K tokens
Context size can be adjusted with --ctx-size <n> flag.

Model Selection Persistence

Active model choices persist across sessions in:
~/Library/RCLI/config
Example config file:
model=qwen3.5-4b
tts_model=kokoro-en
stt_model=parakeet-tdt
You can switch models at any time using:
rcli models          # Interactive model browser
rcli upgrade-llm     # Guided LLM upgrade
rcli upgrade-stt     # Upgrade to Parakeet TDT
rcli voices          # Switch TTS voices

Next Steps

LLM Models

Compare all 9 language models with specs and benchmarks

STT Models

Explore speech-to-text options (Zipformer, Whisper, Parakeet)

TTS Models

Browse 6 TTS voices with quality ratings

Switching Models

Learn how to hot-swap models without restart

Build docs developers (and LLMs) love