Skip to main content
RCLI supports global options that modify behavior across all commands.

Global Options

--models
string
default:"~/Library/RCLI/models"
Specify models directory path.
rcli --models /path/to/models
rcli ask --models ~/custom-models "open Safari"
Useful for testing different model sets or managing multiple configurations.
--rag
string
Load RAG index for document-grounded answers.
rcli --rag ~/Library/RCLI/index
rcli ask --rag ~/Documents/my-index "summarize the project"
When enabled, all queries are augmented with retrieved context from the indexed documents.
--gpu-layers
integer
default:"99"
Number of LLM layers to offload to GPU.
rcli --gpu-layers 0        # CPU-only inference
rcli --gpu-layers 20       # Partial GPU offload
rcli --gpu-layers 99       # All layers on GPU (default)
When to adjust:
  • Set to 0 for CPU-only testing
  • Reduce if experiencing memory pressure
  • Default 99 uses all available GPU layers (optimal for Apple Silicon)
--ctx-size
integer
default:"4096"
LLM context window size (tokens).
rcli --ctx-size 8192       # Larger context for long conversations
rcli --ctx-size 2048       # Smaller context for speed
Tradeoffs:
  • Larger context: more conversation history, higher memory usage
  • Smaller context: faster inference, less memory
--no-speak
boolean
Disable TTS audio playback (text output only).
rcli listen --no-speak
rcli ask --no-speak "what time is it?"
Use cases:
  • Silent operation (text-only responses)
  • Faster benchmarking (skip TTS synthesis)
  • Headless environments
--verbose
boolean
Enable debug logging.
rcli --verbose
rcli -v bench --suite llm
Shows detailed logs from llama.cpp, sherpa-onnx, and RCLI internals.
--help
boolean
Display help information.
rcli --help
rcli -h
rcli listen --help
rcli bench --help

Benchmark Options

These options are specific to rcli bench:
--suite
string
default:"all"
Benchmark suite to run.Available suites:
  • all - All benchmarks (default)
  • stt - Speech-to-text
  • llm - Language model generation
  • tts - Text-to-speech synthesis
  • e2e - End-to-end pipeline (voice in → audio out)
  • tools - Tool-calling accuracy and latency
  • rag - RAG retrieval performance
  • memory - Memory usage profiling
rcli bench --suite llm
rcli bench --suite tools
rcli bench --suite "llm,tts,e2e"    # Comma-separated
--runs
integer
default:"3"
Number of measured runs per benchmark.
rcli bench --runs 5
rcli bench --suite llm --runs 10
More runs = more stable results, longer total runtime.
--output
string
Export benchmark results to JSON file.
rcli bench --output results.json
rcli bench --suite llm --output llm-perf.json
Output format:
{
  "suite": "llm",
  "runs": 3,
  "llm": {
    "model": "lfm2-1.2b-tool-q4_k_m",
    "tokens": 156,
    "tokens_per_sec": 159.6,
    "ttft_ms": 22.5,
    "total_ms": 977
  }
}
--all-llm
boolean
Benchmark all installed LLM models.
rcli bench --all-llm --suite llm
rcli bench --all-llm --suite tools
Compares performance across all downloaded language models.
--all-tts
boolean
Benchmark all installed TTS voices.
rcli bench --all-tts --suite tts
Compares synthesis speed and quality across all downloaded voices.
--llm
string
Specify LLM model for benchmark (overrides active selection).
rcli bench --llm qwen3-0.6b --suite llm
--tts
string
Specify TTS voice for benchmark (overrides active selection).
rcli bench --tts piper-amy --suite tts
--stt
string
Specify STT model for benchmark (overrides active selection).
rcli bench --stt parakeet-tdt-0.6b --suite stt

Examples

# Interactive mode with RAG enabled
rcli --rag ~/Library/RCLI/index

# Ask with custom models directory
rcli ask --models ~/test-models "what time is it?"

# Listen mode without TTS playback
rcli listen --no-speak

Option Precedence

When the same option is specified multiple times:
  1. Command-line flags (highest priority)
  2. Config file settings
  3. Default values (lowest priority)

Performance Tips

# Maximize throughput
rcli --gpu-layers 99 --ctx-size 2048 --no-speak
  • Full GPU offload (--gpu-layers 99)
  • Smaller context window (--ctx-size 2048)
  • Skip TTS (--no-speak)
# Minimize memory usage
rcli --gpu-layers 0 --ctx-size 2048
  • CPU-only inference (--gpu-layers 0)
  • Small context (--ctx-size 2048)
  • Use smaller models via rcli models
# Best quality responses
rcli --ctx-size 8192 --rag ~/Library/RCLI/index
  • Large context window (--ctx-size 8192)
  • RAG enabled for grounded answers
  • Use larger models (Qwen3.5 4B) via rcli upgrade-llm

Commands

Complete command reference

Environment

Environment variables and config files

Build docs developers (and LLMs) love