Common Issues
Microphone Permissions
Symptom: No audio captured, STT returns empty transcripts Cause: macOS requires explicit microphone permissions for Terminal apps. Fix:- Open System Settings → Privacy & Security → Microphone
-
Enable microphone access for:
Terminal.app(if running from Terminal)iTerm.app(if using iTerm2)- Your terminal emulator
-
Restart terminal and try again:
If you installed via Homebrew, the
rcli binary is in /opt/homebrew/bin/. macOS prompts for permissions on first microphone access.Out of Memory (OOM)
Symptom:rcli crashes with “Killed: 9” or “malloc failed”
Cause: Model too large for available RAM, or GPU layers too high.
Fix:
-
Reduce GPU layers:
-
Use smaller model:
-
Reduce context size:
-
Disable mlock:
Slow Inference
Symptom: LLM generates <50 tok/s, high latency Cause: CPU-only inference, or insufficient GPU layers. Fix:-
Check GPU layers:
-
Enable Metal GPU:
-
Reduce thread count (if GPU-enabled):
-
Enable Flash Attention:
STT Transcription Errors
Symptom: STT returns gibberish or empty text Possible Causes:-
Background noise: VAD filters out speech
-
Silence threshold too low:
-
Wrong sample rate:
-
Model mismatch: Try switching STT models:
TTS Audio Glitches
Symptom: Choppy playback, crackling, or silence Possible Causes:-
Ring buffer underrun: Playback faster than synthesis
-
CPU throttling: Check Activity Monitor
- Quit other apps to free CPU/GPU
-
Sample rate mismatch:
-
Voice model corruption:
Tool Calling Failures
Symptom: LLM doesn’t execute actions, or parses wrong tool Possible Causes:-
Model doesn’t support tool calling: Use LFM2 1.2B Tool or Qwen3.5 2B+
-
Tool definitions not loaded:
-
Action disabled:
-
Parse errors: Enable tool trace to debug:
RAG Retrieval Errors
Symptom:rcli ask --rag returns “Index not found”
Fix:
-
Build index first:
-
Check index path:
-
Specify index explicitly:
Model Download Failures
Symptom:rcli setup or rcli models fails to download models
Possible Causes:
-
Network timeout:
-
Disk space:
- Hugging Face rate limit: Wait 5-10 min and retry
-
Corrupted download:
Debugging
Enable Verbose Logging
Debug Logging
- Memory pool allocations
- Ring buffer read/write operations
- KV cache hits/misses
- Tool call parsing steps
Redirect Logs to File
Tool Call Trace
In TUI, press T to toggle tool call trace. Shows:Inspect Memory Pool
Profile Performance
Error Messages
MemoryPool: out of memory
MemoryPool: out of memory
Cause: Pre-allocated memory pool exhausted.Fix:Or reduce audio buffer sizes:
Failed to init LLM model
Failed to init LLM model
Possible Causes:
-
Corrupted model file:
-
Incompatible GGUF version:
-
GPU layers too high:
Audio init failed
Audio init failed
Cause: CoreAudio device not found or permissions denied.Fix:
- Check microphone permissions (see above)
-
Test microphone:
-
Check audio device:
-
Restart CoreAudio:
VAD init failed (will process all audio)
VAD init failed (will process all audio)
Impact: Non-fatal warning. Pipeline continues without VAD.Effect: All audio (including silence) sent to STT. May produce phantom transcripts.Fix:
Offline STT init failed (will use streaming STT)
Offline STT init failed (will use streaming STT)
Impact: Non-fatal warning. Falls back to Zipformer.Effect: Lower STT accuracy (Zipformer ~8% WER vs Whisper ~5% WER).Fix:
KV cache full, clearing...
KV cache full, clearing...
Cause: Context window exceeded. Conversation history too long.Effect: KV cache cleared, next response slower (no cache reuse).Fix:
Platform-Specific Issues
macOS Ventura (13.0+)
Issue: “Operation not permitted” errors Fix: Grant Full Disk Access to Terminal:- System Settings → Privacy & Security → Full Disk Access
- Add
Terminal.apporiTerm.app - Restart terminal
macOS Sonoma (14.0+)
Issue: Metal shader compilation warnings Fix: Update to latest macOS patch (Metal shader cache rebuild)Apple Silicon Rosetta
Issue: Running x86_64 build on ARM64 Fix: Install ARM64 native build:Performance Debugging
High CPU Usage
Symptom: CPU at 100%, fan spinning Possible Causes:-
CPU-only inference: Enable GPU layers
-
Too many threads:
-
Large batch size:
High Memory Usage
Symptom: Memory pressure, swap usage Possible Causes:-
Large KV cache:
-
mlock enabled:
-
Multiple models loaded:
-
Large memory pool:
GPU Not Utilized
Symptom: GPU idle, slow inference Check:Ring Buffer Overruns
Symptom: Choppy audio, dropped samples Check logs:Crash Reports
Generate Crash Report
Ifrcli crashes, macOS generates a crash report:
Useful Info for Bug Reports
- macOS version:
sw_vers - RCLI version:
rcli --version - Hardware:
sysctl hw.model - Crash logs (if applicable)
Reset to Defaults
Clear All Configuration
Remove All Models
Clear RAG Index
Full Reset
Getting Help
Built-in Help
Community Support
GitHub Issues
Report bugs or request features
Contributing Guide
Build from source, architecture docs
Diagnostic Commands
Next Steps
Architecture
Deep dive into pipeline design and threading
Performance
Benchmark results and optimization tips