Usage
How It Works
- Always Listening — VAD (Silero) detects speech activity
- Streaming STT — Zipformer transcribes live audio
- Endpoint Detection — Detects silence, finalizes transcript
- LLM Processing — Classifies intent, executes actions, generates response
- TTS Playback — Speaks the result via Piper/Kokoro
- Loop — Returns to listening state
Voice Pipeline States
The terminal displays an animated dog indicating the current state:Supported Commands
macOS Actions
Conversational Queries
RAG Queries (with —rag)
Options
Models directory path
GPU layers for LLM (99 = all, 0 = CPU only)
Disable TTS audio output (text only)
Load RAG index for document-grounded answers
Show debug logs from engines
Voice Activity Detection (VAD)
Listen mode uses Silero VAD to:- Filter background noise
- Detect speech start/end
- Trigger transcription only when speaking
STT Models
Listen mode uses two STT models in parallel:Zipformer (Streaming)
- Purpose — Real-time transcription during speech
- Speed — ~50ms latency
- Accuracy — Good for live feedback
- Size — ~50 MB
Whisper/Parakeet (Offline)
- Purpose — Final accurate transcription after speech ends
- Speed — ~40ms for Whisper base.en, ~60ms for Parakeet
- Accuracy — Higher (Whisper ~5% WER, Parakeet ~1.9% WER)
- Size — 140 MB (Whisper), 640 MB (Parakeet)
Performance Metrics
After each interaction, listen mode prints:- STT — Transcription time
- LLM — Tokens, throughput, time-to-first-token
- TTS — Synthesis time, real-time factor
Stopping Listen Mode
PressCtrl+C to gracefully stop:
Example Session
Troubleshooting
No Speech Detected
If RCLI doesn’t respond:- Check microphone permission — System Settings > Privacy & Security > Microphone
- Test mic levels —
rcli mic-test - Speak louder or closer — VAD threshold is 0.003 RMS
Slow Response
If TTFT > 100ms:- Use smaller LLM —
rcli models→ select Qwen3 0.6B or LFM2 350M - Increase GPU layers —
--gpu-layers 99(default) - Check system load — Close other GPU-heavy apps
Incorrect Transcription
If STT accuracy is low:- Upgrade STT model —
rcli upgrade-stt(Parakeet TDT, ~1.9% WER) - Speak clearly — Avoid background noise
- Check mic quality — Built-in MacBook mic is sufficient
Advanced Usage
Custom System Prompt
Modify~/Library/RCLI/config/system_prompt.txt to change LLM behavior: