Models Overview

RCLI runs a complete AI pipeline entirely on your Mac using Apple Silicon with Metal GPU acceleration. All models are stored locally and run without any cloud dependency.

Model Architecture

RCLI uses 5 model types working together in a multi-threaded pipeline:

STT (Speech-to-Text)

Converts voice input to text using Zipformer (streaming) or Whisper/Parakeet (offline)

LLM (Language Model)

Processes requests and generates responses using Qwen3 or Liquid LFM2 models

TTS (Text-to-Speech)

Synthesizes natural speech output using Piper, Kokoro, or KittenTTS voices

VAD (Voice Activity)

Detects speech vs. silence in real-time using Silero VAD (0.6 MB)

Embeddings

Generates text embeddings for RAG using Snowflake Arctic Embed S (34 MB)

Storage Location

All models are stored in a single directory on your Mac:

~/Library/RCLI/models/
├── lfm2-1.2b-tool-q4_k_m.gguf          # LLM (731 MB)
├── zipformer/                           # Streaming STT (~50 MB)
├── whisper-base.en/                     # Offline STT (~140 MB)
├── piper-voice/                         # TTS voice (~60 MB)
├── silero_vad.onnx                      # VAD (0.6 MB)
└── snowflake-arctic-embed-s-q8_0.gguf  # Embeddings (34 MB)

Total default install size: ~1 GB. Models are downloaded once during rcli setup and persist across sessions.

Default Models

The default model set installed by rcli setup is optimized for speed, quality, and disk efficiency:

Component	Default Model	Size	Key Feature
LLM	Liquid LFM2 1.2B Tool	731 MB	Excellent tool calling (~180 t/s)
STT (Streaming)	Zipformer	50 MB	Real-time streaming for live mic
STT (Offline)	Whisper base.en	140 MB	~5% WER, fast batch transcription
TTS	Piper Lessac	60 MB	Fast synthesis, clear English voice
VAD	Silero VAD	0.6 MB	Real-time speech detection
Embeddings	Snowflake Arctic S	34 MB	High-quality text embeddings for RAG

Model Inference Engines

RCLI uses optimized inference engines for each model type:

llama.cpp — LLM + embedding inference with Metal GPU, KV caching, Flash Attention
sherpa-onnx — STT, TTS, and VAD via ONNX Runtime with Metal acceleration
USearch — HNSW vector index for fast hybrid retrieval (~4ms)

Context Windows

LLM models support different context sizes:

Model Family	Default Context	Max Context
Qwen3 / Qwen3.5	4,096 tokens	32K - 262K tokens
Liquid LFM2	4,096 tokens	128K tokens

Context size can be adjusted with --ctx-size <n> flag.

Model Selection Persistence

Active model choices persist across sessions in:

~/Library/RCLI/config

Example config file:

model=qwen3.5-4b
tts_model=kokoro-en
stt_model=parakeet-tdt

You can switch models at any time using:

rcli models          # Interactive model browser
rcli upgrade-llm     # Guided LLM upgrade
rcli upgrade-stt     # Upgrade to Parakeet TDT
rcli voices          # Switch TTS voices

Next Steps

LLM Models

Compare all 9 language models with specs and benchmarks

STT Models

Explore speech-to-text options (Zipformer, Whisper, Parakeet)

TTS Models

Browse 6 TTS voices with quality ratings

Switching Models

Learn how to hot-swap models without restart

Get Started

Core Features

Commands

Models

Actions

Advanced

Development

Models Overview

Model Architecture

STT (Speech-to-Text)

LLM (Language Model)

TTS (Text-to-Speech)

VAD (Voice Activity)

Embeddings

Storage Location

Default Models

Model Inference Engines

Context Windows

Model Selection Persistence

Next Steps

LLM Models

STT Models

TTS Models

Switching Models

Build docs developers (and LLMs) love

Get Started

Core Features

Commands

Models

Actions

Advanced

Development

​Model Architecture

STT (Speech-to-Text)

LLM (Language Model)

TTS (Text-to-Speech)

VAD (Voice Activity)

Embeddings

​Storage Location

​Default Models

​Model Inference Engines

​Context Windows

​Model Selection Persistence

​Next Steps

LLM Models

STT Models

TTS Models

Switching Models

Build docs developers (and LLMs) love

Model Architecture

Storage Location

Default Models

Model Inference Engines

Context Windows

Model Selection Persistence

Next Steps