Skip to main content
RCLI supports 9 LLM models across two families: Qwen3/Qwen3.5 (Alibaba) and Liquid LFM2 (Liquid AI). All models run locally on Apple Silicon with Metal GPU acceleration.

Quick Comparison

ModelProviderSizeSpeedTool CallingContext
LFM2 350MLiquid AI219 MB~350 t/sBasic128K
Qwen3 0.6BAlibaba Qwen456 MB~250 t/sBasic32K
Qwen3.5 0.8BAlibaba Qwen600 MB~220 t/sBasic32K
LFM2 1.2B ToolLiquid AI731 MB~180 t/sExcellent128K
LFM2.5 1.2BLiquid AI731 MB~180 t/sGood128K
Qwen3.5 2BAlibaba Qwen1.2 GB~150 t/sGood32K
LFM2 2.6BLiquid AI1.5 GB~120 t/sGood128K
Qwen3 4BAlibaba Qwen2.5 GB~80 t/sGood32K
Qwen3.5 4B 🏆Alibaba Qwen2.7 GB~75 t/sExcellent262K
⭐ = Default model (ships with rcli setup)
🏆 = Recommended upgrade for best quality

Model Details

Liquid AI LFM2 Family

Liquid AI’s LFM (Liquid Foundation Models) use a novel architecture optimized for efficiency and long context.

LFM2 1.2B Tool (Default)

Best for: Tool calling + conversation. Excellent balance of speed and accuracy.
  • Size: 731 MB (Q4_K_M quantization)
  • Speed: ~180 tokens/sec on Apple M3 Max
  • Tool Calling: Excellent (native <|tool_call_start|> format)
  • Context: 128K tokens
  • License: LFM Open
  • Download: rcli setup (default) or rcli models
Key Features:
  • Trained specifically for function calling
  • Uses native tool format: <|tool_call_start|>[func(arg="val")]<|tool_call_end|>
  • High accuracy on macOS actions (43 supported actions)
  • Fast inference with long context support

LFM2 350M

  • Size: 219 MB
  • Speed: ~350 tokens/sec (fastest model)
  • Tool Calling: Basic
  • Context: 128K tokens
  • Best for: Ultra-fast responses, resource-constrained devices

LFM2.5 1.2B Instruct

  • Size: 731 MB
  • Speed: ~180 tokens/sec
  • Tool Calling: Good (chat template format)
  • Context: 128K tokens
  • Best for: Next-generation Liquid model with improved conversational quality

LFM2 2.6B

  • Size: 1.5 GB
  • Speed: ~120 tokens/sec
  • Tool Calling: Good
  • Context: 128K tokens
  • Best for: Stronger conversational ability, complex reasoning

Alibaba Qwen3 Family

Qwen models are open-source LLMs from Alibaba with strong multilingual and tool-calling capabilities.

Qwen3 0.6B

  • Size: 456 MB
  • Speed: ~250 tokens/sec
  • Tool Calling: Basic (limited accuracy)
  • Context: 32K tokens
  • License: Apache 2.0
  • Best for: Ultra-fast inference, smallest footprint

Qwen3.5 0.8B

  • Size: 600 MB
  • Speed: ~220 tokens/sec
  • Tool Calling: Basic
  • Context: 32K tokens
  • Best for: Qwen3.5 generation, slightly better than 0.6B

Qwen3.5 2B

  • Size: 1.2 GB
  • Speed: ~150 tokens/sec
  • Tool Calling: Good
  • Context: 32K tokens
  • Best for: Solid all-rounder for tool calling + conversations

Qwen3 4B

  • Size: 2.5 GB
  • Speed: ~80 tokens/sec
  • Tool Calling: Good
  • Context: 32K tokens
  • Best for: Smart reasoning, needs more RAM
Best overall quality. Native tool calling with 262K context window.
  • Size: 2.7 GB
  • Speed: ~75 tokens/sec
  • Tool Calling: Excellent (native <tool_call> XML format)
  • Context: 262K tokens
  • License: Apache 2.0
  • Download: rcli upgrade-llm
Key Features:
  • Best small model for complex tasks
  • Native tool calling with <tool_call> / </tool_call> tags
  • Massive 262K context window (vs. 32K in Qwen3 4B)
  • High accuracy on RCLI’s 43 macOS actions

Tool Calling Formats

Different model families use different tool-calling formats. RCLI automatically detects and parses each format.

Qwen3 Format (JSON-based)

<tool_call>
{"name": "open_app", "arguments": {"app_name": "Safari"}}
</tool_call>

LFM2 Format (Function-call syntax)

<|tool_call_start|>[open_app(app_name="Safari")]<|tool_call_end|>
Both formats are supported natively by RCLI’s tool-calling engine.

Quantization

All models use Q4_K_M quantization (4-bit with mixed precision):
  • Balances quality and size (~70% size reduction vs. FP16)
  • Minimal accuracy loss (<3% perplexity increase)
  • Fast inference on Metal GPU
  • Compatible with llama.cpp Metal backend

Benchmarks

Performance measured on Apple M3 Max (14-core CPU, 30-core GPU, 36 GB RAM):

Speed Benchmarks

rcli bench --suite llm            # Benchmark active LLM
rcli bench --all-llm --suite llm  # Compare all installed LLMs
ModelTime to First TokenGeneration SpeedTokens/128ms
LFM2 350M18 ms350 t/s44 tokens
Qwen3 0.6B19 ms250 t/s32 tokens
LFM2 1.2B22 ms180 t/s23 tokens
Qwen3.5 2B24 ms150 t/s19 tokens
LFM2 2.6B28 ms120 t/s15 tokens
Qwen3.5 4B30 ms75 t/s9 tokens

Tool Calling Accuracy

rcli bench --suite tools            # Benchmark active model
rcli bench --all-llm --suite tools  # Compare all models
ModelAccuracy (43 actions)Avg Latency
Qwen3.5 4B98.5%180 ms
LFM2 1.2B Tool97.2%140 ms
LFM2 2.6B95.8%190 ms
Qwen3.5 2B93.1%160 ms
Qwen3 4B92.4%200 ms
LFM2.5 1.2B91.7%145 ms
Accuracy measured on RCLI’s 43 macOS actions test suite. Latency includes LLM inference + tool execution.

Switching Models

See Switching Models for hot-swap instructions and CLI commands.

Next Steps

Upgrade Your LLM

Guided LLM upgrade with size/speed comparisons

Benchmark Models

Run comprehensive benchmarks on all installed models

Switch Models

Hot-swap models without restarting RCLI

Model Browser

Interactive model management in TUI

Build docs developers (and LLMs) love