Model Management

RCLI supports 20+ AI models across LLM, STT, and TTS modalities. Use these commands to manage your local model collection.

Commands Overview

rcli models              # Interactive model browser (all modalities)
rcli models llm          # Jump to LLM management
rcli models stt          # Jump to STT management
rcli models tts          # Jump to TTS (same as `rcli voices`)
rcli voices              # Manage TTS voices
rcli upgrade-llm         # Guided LLM upgrade
rcli upgrade-stt         # Upgrade to Parakeet TDT
rcli cleanup             # Remove unused models
rcli info                # Show active models and engine info

Interactive Model Browser

rcli models

Launches a full-screen TUI with:

LLM Models — 9 models (350M to 4B parameters)
STT Models — 2 offline models (Whisper, Parakeet)
TTS Voices — 6 voice models (11-103 speakers)

Up/Down — Navigate list
Enter — Select or download model
ESC — Close panel

Model States

Active — Currently loaded (green checkmark)
Installed — Available locally
Not installed — Available for download (grayed out)
Default — Included in rcli setup
Recommended — Best for most users

Switching Models

Press Enter on a model to:

LLM Hot-Swap (Runtime)

# In TUI: M → select Qwen3.5 2B → Enter
# Model switches immediately without restart

Switching to Qwen3.5 2B...
Switched to Qwen3.5 2B

The LLM is hot-swapped at runtime:

Unloads current model
Loads new model to Metal GPU
Re-detects model profile (Qwen3/LFM2/etc.)
Re-caches system prompt with correct tool format
Persists selection to ~/.rcli/config/model_selection.json

STT/TTS Selection (Next Launch)

# In TUI: M → select Parakeet TDT → Enter

Selected: Parakeet TDT. Restart RCLI to apply.

STT and TTS require a restart to take effect.

Downloading Models

Press Enter on an uninstalled model to download:

Downloading Qwen3.5 2B (1200 MB)...
[====================] 100%
Download complete!

Models are downloaded from Hugging Face via curl.

LLM Models

Model	Size	Speed	License	Features
LFM2 1.2B Tool	731 MB	~180 t/s	LFM Open	Tool calling, default
LFM2 350M	219 MB	~350 t/s	LFM Open	Fastest, 128K ctx
LFM2.5 1.2B Instruct	731 MB	~180 t/s	LFM Open	128K ctx
LFM2 2.6B	1.5 GB	~120 t/s	LFM Open	Better conversational
Qwen3 0.6B	456 MB	~250 t/s	Apache 2.0	Ultra-fast
Qwen3.5 0.8B	600 MB	~220 t/s	Apache 2.0	Qwen3.5 generation
Qwen3.5 2B	1.2 GB	~150 t/s	Apache 2.0	Recommended
Qwen3 4B	2.5 GB	~80 t/s	Apache 2.0	Smart reasoning
Qwen3.5 4B	2.7 GB	~75 t/s	Apache 2.0	Best small model, 262K ctx

Upgrade LLM

rcli upgrade-llm

Interactive wizard guides you through upgrading to a larger LLM:

  Upgrade LLM

  Current: LFM2 1.2B Tool (731 MB)

  Recommended upgrades:

    1. Qwen3.5 2B       1200 MB   Better reasoning
    2. Qwen3.5 4B       2700 MB   Best small model, 262K context
    3. LFM2 2.6B        1500 MB   Stronger conversational

  Select an option (1-3) or q to quit: 1

  Downloading Qwen3.5 2B (1200 MB)...
  [====================] 100%
  Download complete!

  Switch to Qwen3.5 2B now? (y/n): y
  Switched to Qwen3.5 2B.

STT Models

RCLI uses two STT models in parallel:

Zipformer (Streaming)

Purpose — Real-time transcription during speech
Accuracy — Good (~5% WER)
Speed — ~50ms latency
Size — ~50 MB
Included in — rcli setup (always active)

Offline STT Models

Model	Size	Accuracy	License	Features
Whisper base.en	140 MB	~5% WER	MIT	English, default
Parakeet TDT 0.6B v3	640 MB	~1.9% WER	CC-BY-4.0	25 languages, auto-punctuation

Upgrade STT

rcli upgrade-stt

Upgrades to Parakeet TDT (best accuracy):

  Upgrade STT

  Current: Whisper base.en (~5% WER, 140 MB)
  Upgrade: Parakeet TDT 0.6B v3 (~1.9% WER, 640 MB)

  Parakeet offers:
    • 3x better accuracy (~1.9% WER vs ~5%)
    • 25 languages (vs English-only)
    • Auto-punctuation
    • Slightly slower (~60ms vs ~40ms)

  Download Parakeet TDT? (y/n): y

  Downloading Parakeet TDT (640 MB)...
  [====================] 100%
  Download complete!

  Restart RCLI to use Parakeet TDT.

TTS Voices

rcli voices

Lists all TTS voices:

  Voices  (auto-detect)

  #  Voice                          Size      Arch      Speakers    Status
  1  Piper Lessac (default)         60 MB     Piper     1           * active
  2  Piper Amy                      60 MB     Piper     1           installed
  3  KittenTTS Nano                 90 MB     Kitten    8           not installed
  4  Matcha LJSpeech                100 MB    Matcha    1           not installed
  5  Kokoro English v0.19           310 MB    Kokoro    11          not installed
  6  Kokoro Multi-lang v1.1         500 MB    Kokoro    103         not installed

  Tip: Run `rcli voices` to switch voices.

Voice	Size	Speakers	License	Features
Piper Lessac	60 MB	1	MIT	Clear English, default
Piper Amy	60 MB	1	MIT	Warm female voice
KittenTTS Nano	90 MB	8	Apache 2.0	4M/4F voices
Matcha LJSpeech	100 MB	1	MIT	HiFi-GAN vocoder
Kokoro English v0.19	310 MB	11	Apache 2.0	Best English quality
Kokoro Multi-lang v1.1	500 MB	103	Apache 2.0	Chinese + English

Multi-Speaker Voices

KittenTTS and Kokoro support multiple speakers. Configure via ~/.rcli/config/tts.json:

{
  "model": "kokoro-en-v0_19",
  "speaker_id": 3
}

Cleanup Unused Models

rcli cleanup

Interactive TUI lists all installed models:

  Model Cleanup
  Arrow keys to navigate, ENTER to delete, ESC to close

   > Qwen3 0.6B  [LLM]  456 MB
     Whisper base.en  [STT]  140 MB
     Piper Amy  [TTS]  60 MB
     LFM2 1.2B Tool  [LLM]  731 MB (active)

  [Up/Down] navigate  [Enter] delete  [ESC] close

Press Enter to delete selected model:

Active models — Cannot be deleted (switch first)
Inactive models — Deleted immediately

Selection preferences are updated automatically.

Engine Info

rcli info

Displays active models and hardware:

  RCLI Engine Info

  Version: 0.4.0

  Models:
    LLM: Qwen3.5 2B (1200 MB)
    STT: Whisper base.en (offline) | Zipformer (streaming)
    TTS: Piper Lessac
    Embeddings: Snowflake Arctic Embed S (34 MB)

  Hardware:
    Chip: Apple M3 Max
    CPU: 14 cores (10P+4E)
    GPU: 30 cores
    RAM: 36 GB
    ANE: 16-core Neural Engine

  Paths:
    Models: ~/Library/RCLI/models
    Config: ~/.rcli/config
    Index: ~/Library/RCLI/index

Model Storage Locations

~/Library/RCLI/models/
  ├── qwen3.5-2b-q4_k_m.gguf            # LLM
  ├── lfm2-1.2b-tool-q4_k_m.gguf        # LLM
  ├── whisper-base-en/                  # STT
  │   ├── encoder.onnx
  │   ├── decoder.onnx
  │   └── tokens.txt
  ├── parakeet-tdt-0.6b-v3/             # STT
  ├── zipformer-streaming/              # STT (streaming)
  ├── piper-lessac-medium/              # TTS
  │   ├── model.onnx
  │   └── config.json
  ├── kokoro-en-v0_19/                  # TTS
  ├── silero-vad.onnx                   # VAD
  └── arctic-embed-s.gguf               # Embeddings

Model Selection Persistence

User preferences are saved to:

~/.rcli/config/model_selection.json

{
  "llm": "qwen3.5-2b",
  "stt": "parakeet-tdt-0.6b-v3",
  "tts": "piper-lessac-medium"
}

To reset to defaults:

rm ~/.rcli/config/model_selection.json

Benchmarking Models

Compare All LLMs

rcli bench --all-llm --suite llm

# Output:
--- LLM Benchmark (All Models) ---
  Qwen3 0.6B:      TTFT 18ms   250 tok/s
  Qwen3.5 0.8B:    TTFT 20ms   220 tok/s
  Qwen3.5 2B:      TTFT 25ms   150 tok/s
  LFM2 1.2B Tool:  TTFT 22ms   180 tok/s

Compare All TTS

rcli bench --all-tts --suite tts

# Output:
--- TTS Benchmark (All Voices) ---
  Piper Lessac:    142ms   0.8x RT
  Piper Amy:       138ms   0.7x RT
  Kokoro English:  189ms   1.1x RT

Troubleshooting

Model Download Fails

Error: Failed to download model
curl: (56) Recv failure: Connection reset by peer

Solution: Check internet connection, retry download

Out of Disk Space

Error: Not enough disk space (need 1.2 GB, have 500 MB)

Solution: rcli cleanup to free space

Model Not Found After Download

Error: Model file not found at ~/Library/RCLI/models/qwen3.5-2b.gguf

Solution: Re-run rcli models, download again

LLM Switch Fails

Failed to switch to Qwen3.5 2B
Error: llama_model_load: failed to load model

Solution: Model file may be corrupted, delete and re-download

Get Started

Core Features

Commands

Models

Actions

Advanced

Development

Model Management

Commands Overview

Interactive Model Browser

Navigation

Model States

Switching Models

LLM Hot-Swap (Runtime)

STT/TTS Selection (Next Launch)

Downloading Models

LLM Models

Upgrade LLM

STT Models

Zipformer (Streaming)

Offline STT Models

Upgrade STT

TTS Voices

Multi-Speaker Voices

Cleanup Unused Models

Engine Info

Model Storage Locations

Model Selection Persistence

Benchmarking Models

Compare All LLMs

Compare All TTS

Troubleshooting

Model Download Fails

Out of Disk Space

Model Not Found After Download

LLM Switch Fails

Build docs developers (and LLMs) love

Get Started

Core Features

Commands

Models

Actions

Advanced

Development

​Commands Overview

​Interactive Model Browser

​Navigation

​Model States

​Switching Models

​LLM Hot-Swap (Runtime)

​STT/TTS Selection (Next Launch)

​Downloading Models

​LLM Models

​Upgrade LLM

​STT Models

​Zipformer (Streaming)

​Offline STT Models

​Upgrade STT

​TTS Voices

​Multi-Speaker Voices

​Cleanup Unused Models

​Engine Info

​Model Storage Locations

​Model Selection Persistence

​Benchmarking Models

​Compare All LLMs

​Compare All TTS

​Troubleshooting

​Model Download Fails

​Out of Disk Space

​Model Not Found After Download

​LLM Switch Fails

Build docs developers (and LLMs) love

Commands Overview

Interactive Model Browser

Navigation

Model States

Switching Models

LLM Hot-Swap (Runtime)

STT/TTS Selection (Next Launch)

Downloading Models

LLM Models

Upgrade LLM

STT Models

Zipformer (Streaming)

Offline STT Models

Upgrade STT

TTS Voices

Multi-Speaker Voices

Cleanup Unused Models

Engine Info

Model Storage Locations

Model Selection Persistence

Benchmarking Models

Compare All LLMs

Compare All TTS

Troubleshooting

Model Download Fails

Out of Disk Space

Model Not Found After Download

LLM Switch Fails