Skip to main content
RCLI supports 20+ AI models across LLM, STT, and TTS modalities. Use these commands to manage your local model collection.

Commands Overview

rcli models              # Interactive model browser (all modalities)
rcli models llm          # Jump to LLM management
rcli models stt          # Jump to STT management
rcli models tts          # Jump to TTS (same as `rcli voices`)
rcli voices              # Manage TTS voices
rcli upgrade-llm         # Guided LLM upgrade
rcli upgrade-stt         # Upgrade to Parakeet TDT
rcli cleanup             # Remove unused models
rcli info                # Show active models and engine info

Interactive Model Browser

rcli models
Launches a full-screen TUI with:
  • LLM Models — 9 models (350M to 4B parameters)
  • STT Models — 2 offline models (Whisper, Parakeet)
  • TTS Voices — 6 voice models (11-103 speakers)
  • Up/Down — Navigate list
  • Enter — Select or download model
  • ESC — Close panel

Model States

  • Active — Currently loaded (green checkmark)
  • Installed — Available locally
  • Not installed — Available for download (grayed out)
  • Default — Included in rcli setup
  • Recommended — Best for most users

Switching Models

Press Enter on a model to:

LLM Hot-Swap (Runtime)

# In TUI: M → select Qwen3.5 2B → Enter
# Model switches immediately without restart

Switching to Qwen3.5 2B...
Switched to Qwen3.5 2B
The LLM is hot-swapped at runtime:
  1. Unloads current model
  2. Loads new model to Metal GPU
  3. Re-detects model profile (Qwen3/LFM2/etc.)
  4. Re-caches system prompt with correct tool format
  5. Persists selection to ~/.rcli/config/model_selection.json

STT/TTS Selection (Next Launch)

# In TUI: M → select Parakeet TDT → Enter

Selected: Parakeet TDT. Restart RCLI to apply.
STT and TTS require a restart to take effect.

Downloading Models

Press Enter on an uninstalled model to download:
Downloading Qwen3.5 2B (1200 MB)...
[====================] 100%
Download complete!
Models are downloaded from Hugging Face via curl.

LLM Models

ModelSizeSpeedLicenseFeatures
LFM2 1.2B Tool731 MB~180 t/sLFM OpenTool calling, default
LFM2 350M219 MB~350 t/sLFM OpenFastest, 128K ctx
LFM2.5 1.2B Instruct731 MB~180 t/sLFM Open128K ctx
LFM2 2.6B1.5 GB~120 t/sLFM OpenBetter conversational
Qwen3 0.6B456 MB~250 t/sApache 2.0Ultra-fast
Qwen3.5 0.8B600 MB~220 t/sApache 2.0Qwen3.5 generation
Qwen3.5 2B1.2 GB~150 t/sApache 2.0Recommended
Qwen3 4B2.5 GB~80 t/sApache 2.0Smart reasoning
Qwen3.5 4B2.7 GB~75 t/sApache 2.0Best small model, 262K ctx

Upgrade LLM

rcli upgrade-llm
Interactive wizard guides you through upgrading to a larger LLM:
  Upgrade LLM

  Current: LFM2 1.2B Tool (731 MB)

  Recommended upgrades:

    1. Qwen3.5 2B       1200 MB   Better reasoning
    2. Qwen3.5 4B       2700 MB   Best small model, 262K context
    3. LFM2 2.6B        1500 MB   Stronger conversational

  Select an option (1-3) or q to quit: 1

  Downloading Qwen3.5 2B (1200 MB)...
  [====================] 100%
  Download complete!

  Switch to Qwen3.5 2B now? (y/n): y
  Switched to Qwen3.5 2B.

STT Models

RCLI uses two STT models in parallel:

Zipformer (Streaming)

  • Purpose — Real-time transcription during speech
  • Accuracy — Good (~5% WER)
  • Speed — ~50ms latency
  • Size — ~50 MB
  • Included inrcli setup (always active)

Offline STT Models

ModelSizeAccuracyLicenseFeatures
Whisper base.en140 MB~5% WERMITEnglish, default
Parakeet TDT 0.6B v3640 MB~1.9% WERCC-BY-4.025 languages, auto-punctuation

Upgrade STT

rcli upgrade-stt
Upgrades to Parakeet TDT (best accuracy):
  Upgrade STT

  Current: Whisper base.en (~5% WER, 140 MB)
  Upgrade: Parakeet TDT 0.6B v3 (~1.9% WER, 640 MB)

  Parakeet offers:
    • 3x better accuracy (~1.9% WER vs ~5%)
    • 25 languages (vs English-only)
    • Auto-punctuation
    • Slightly slower (~60ms vs ~40ms)

  Download Parakeet TDT? (y/n): y

  Downloading Parakeet TDT (640 MB)...
  [====================] 100%
  Download complete!

  Restart RCLI to use Parakeet TDT.

TTS Voices

rcli voices
Lists all TTS voices:
  Voices  (auto-detect)

  #  Voice                          Size      Arch      Speakers    Status
  1  Piper Lessac (default)         60 MB     Piper     1           * active
  2  Piper Amy                      60 MB     Piper     1           installed
  3  KittenTTS Nano                 90 MB     Kitten    8           not installed
  4  Matcha LJSpeech                100 MB    Matcha    1           not installed
  5  Kokoro English v0.19           310 MB    Kokoro    11          not installed
  6  Kokoro Multi-lang v1.1         500 MB    Kokoro    103         not installed

  Tip: Run `rcli voices` to switch voices.
VoiceSizeSpeakersLicenseFeatures
Piper Lessac60 MB1MITClear English, default
Piper Amy60 MB1MITWarm female voice
KittenTTS Nano90 MB8Apache 2.04M/4F voices
Matcha LJSpeech100 MB1MITHiFi-GAN vocoder
Kokoro English v0.19310 MB11Apache 2.0Best English quality
Kokoro Multi-lang v1.1500 MB103Apache 2.0Chinese + English

Multi-Speaker Voices

KittenTTS and Kokoro support multiple speakers. Configure via ~/.rcli/config/tts.json:
{
  "model": "kokoro-en-v0_19",
  "speaker_id": 3
}

Cleanup Unused Models

rcli cleanup
Interactive TUI lists all installed models:
  Model Cleanup
  Arrow keys to navigate, ENTER to delete, ESC to close

   > Qwen3 0.6B  [LLM]  456 MB
     Whisper base.en  [STT]  140 MB
     Piper Amy  [TTS]  60 MB
     LFM2 1.2B Tool  [LLM]  731 MB (active)

  [Up/Down] navigate  [Enter] delete  [ESC] close
Press Enter to delete selected model:
  • Active models — Cannot be deleted (switch first)
  • Inactive models — Deleted immediately
Selection preferences are updated automatically.

Engine Info

rcli info
Displays active models and hardware:
  RCLI Engine Info

  Version: 0.4.0

  Models:
    LLM: Qwen3.5 2B (1200 MB)
    STT: Whisper base.en (offline) | Zipformer (streaming)
    TTS: Piper Lessac
    Embeddings: Snowflake Arctic Embed S (34 MB)

  Hardware:
    Chip: Apple M3 Max
    CPU: 14 cores (10P+4E)
    GPU: 30 cores
    RAM: 36 GB
    ANE: 16-core Neural Engine

  Paths:
    Models: ~/Library/RCLI/models
    Config: ~/.rcli/config
    Index: ~/Library/RCLI/index

Model Storage Locations

~/Library/RCLI/models/
  ├── qwen3.5-2b-q4_k_m.gguf            # LLM
  ├── lfm2-1.2b-tool-q4_k_m.gguf        # LLM
  ├── whisper-base-en/                  # STT
  │   ├── encoder.onnx
  │   ├── decoder.onnx
  │   └── tokens.txt
  ├── parakeet-tdt-0.6b-v3/             # STT
  ├── zipformer-streaming/              # STT (streaming)
  ├── piper-lessac-medium/              # TTS
  │   ├── model.onnx
  │   └── config.json
  ├── kokoro-en-v0_19/                  # TTS
  ├── silero-vad.onnx                   # VAD
  └── arctic-embed-s.gguf               # Embeddings

Model Selection Persistence

User preferences are saved to:
~/.rcli/config/model_selection.json
{
  "llm": "qwen3.5-2b",
  "stt": "parakeet-tdt-0.6b-v3",
  "tts": "piper-lessac-medium"
}
To reset to defaults:
rm ~/.rcli/config/model_selection.json

Benchmarking Models

Compare All LLMs

rcli bench --all-llm --suite llm

# Output:
--- LLM Benchmark (All Models) ---
  Qwen3 0.6B:      TTFT 18ms   250 tok/s
  Qwen3.5 0.8B:    TTFT 20ms   220 tok/s
  Qwen3.5 2B:      TTFT 25ms   150 tok/s
  LFM2 1.2B Tool:  TTFT 22ms   180 tok/s

Compare All TTS

rcli bench --all-tts --suite tts

# Output:
--- TTS Benchmark (All Voices) ---
  Piper Lessac:    142ms   0.8x RT
  Piper Amy:       138ms   0.7x RT
  Kokoro English:  189ms   1.1x RT

Troubleshooting

Model Download Fails

Error: Failed to download model
curl: (56) Recv failure: Connection reset by peer
Solution: Check internet connection, retry download

Out of Disk Space

Error: Not enough disk space (need 1.2 GB, have 500 MB)
Solution: rcli cleanup to free space

Model Not Found After Download

Error: Model file not found at ~/Library/RCLI/models/qwen3.5-2b.gguf
Solution: Re-run rcli models, download again

LLM Switch Fails

Failed to switch to Qwen3.5 2B
Error: llama_model_load: failed to load model
Solution: Model file may be corrupted, delete and re-download

Build docs developers (and LLMs) love