Options

RCLI supports global options that modify behavior across all commands.

Global Options

--models

string

default:"~/Library/RCLI/models"

Specify models directory path.

rcli --models /path/to/models
rcli ask --models ~/custom-models "open Safari"

Useful for testing different model sets or managing multiple configurations.

--rag

string

Load RAG index for document-grounded answers.

rcli --rag ~/Library/RCLI/index
rcli ask --rag ~/Documents/my-index "summarize the project"

When enabled, all queries are augmented with retrieved context from the indexed documents.

--gpu-layers

integer

default:"99"

Number of LLM layers to offload to GPU.

rcli --gpu-layers 0        # CPU-only inference
rcli --gpu-layers 20       # Partial GPU offload
rcli --gpu-layers 99       # All layers on GPU (default)

When to adjust:

Set to 0 for CPU-only testing
Reduce if experiencing memory pressure
Default 99 uses all available GPU layers (optimal for Apple Silicon)

--ctx-size

integer

default:"4096"

LLM context window size (tokens).

rcli --ctx-size 8192       # Larger context for long conversations
rcli --ctx-size 2048       # Smaller context for speed

Tradeoffs:

Larger context: more conversation history, higher memory usage
Smaller context: faster inference, less memory

--no-speak

boolean

Disable TTS audio playback (text output only).

rcli listen --no-speak
rcli ask --no-speak "what time is it?"

Use cases:

Silent operation (text-only responses)
Faster benchmarking (skip TTS synthesis)
Headless environments

--verbose

boolean

Enable debug logging.

rcli --verbose
rcli -v bench --suite llm

Shows detailed logs from llama.cpp, sherpa-onnx, and RCLI internals.

--help

boolean

Display help information.

rcli --help
rcli -h
rcli listen --help
rcli bench --help

Benchmark Options

These options are specific to rcli bench:

--suite

string

default:"all"

Benchmark suite to run.Available suites:

all - All benchmarks (default)
stt - Speech-to-text
llm - Language model generation
tts - Text-to-speech synthesis
e2e - End-to-end pipeline (voice in → audio out)
tools - Tool-calling accuracy and latency
rag - RAG retrieval performance
memory - Memory usage profiling

rcli bench --suite llm
rcli bench --suite tools
rcli bench --suite "llm,tts,e2e"    # Comma-separated

--runs

integer

default:"3"

Number of measured runs per benchmark.

rcli bench --runs 5
rcli bench --suite llm --runs 10

More runs = more stable results, longer total runtime.

--output

string

Export benchmark results to JSON file.

rcli bench --output results.json
rcli bench --suite llm --output llm-perf.json

Output format:

{
  "suite": "llm",
  "runs": 3,
  "llm": {
    "model": "lfm2-1.2b-tool-q4_k_m",
    "tokens": 156,
    "tokens_per_sec": 159.6,
    "ttft_ms": 22.5,
    "total_ms": 977
  }
}

--all-llm

boolean

Benchmark all installed LLM models.

rcli bench --all-llm --suite llm
rcli bench --all-llm --suite tools

Compares performance across all downloaded language models.

--all-tts

boolean

Benchmark all installed TTS voices.

rcli bench --all-tts --suite tts

Compares synthesis speed and quality across all downloaded voices.

--llm

string

Specify LLM model for benchmark (overrides active selection).

rcli bench --llm qwen3-0.6b --suite llm

--tts

string

Specify TTS voice for benchmark (overrides active selection).

rcli bench --tts piper-amy --suite tts

--stt

string

Specify STT model for benchmark (overrides active selection).

rcli bench --stt parakeet-tdt-0.6b --suite stt

Examples

# Interactive mode with RAG enabled
rcli --rag ~/Library/RCLI/index

# Ask with custom models directory
rcli ask --models ~/test-models "what time is it?"

# Listen mode without TTS playback
rcli listen --no-speak

Option Precedence

When the same option is specified multiple times:

Command-line flags (highest priority)
Config file settings
Default values (lowest priority)

Performance Tips

Optimize for Speed

# Maximize throughput
rcli --gpu-layers 99 --ctx-size 2048 --no-speak

Full GPU offload (--gpu-layers 99)
Smaller context window (--ctx-size 2048)
Skip TTS (--no-speak)

Optimize for Memory

# Minimize memory usage
rcli --gpu-layers 0 --ctx-size 2048

CPU-only inference (--gpu-layers 0)
Small context (--ctx-size 2048)
Use smaller models via rcli models

Optimize for Quality

# Best quality responses
rcli --ctx-size 8192 --rag ~/Library/RCLI/index

Large context window (--ctx-size 8192)
RAG enabled for grounded answers
Use larger models (Qwen3.5 4B) via rcli upgrade-llm

Commands

Complete command reference

Environment

Environment variables and config files

C API

CLI Reference

Global Options

Benchmark Options

Examples

Option Precedence

Performance Tips

Commands

Environment

Build docs developers (and LLMs) love

C API

CLI Reference

​Global Options

​Benchmark Options

​Examples

​Option Precedence

​Performance Tips

​Related

Commands

Environment

Build docs developers (and LLMs) love

Global Options

Benchmark Options

Examples

Option Precedence

Performance Tips

Related