LLM Models

RCLI supports 9 LLM models across two families: Qwen3/Qwen3.5 (Alibaba) and Liquid LFM2 (Liquid AI). All models run locally on Apple Silicon with Metal GPU acceleration.

Quick Comparison

By Speed
By Size
By Tool Calling

Model	Provider	Size	Speed	Tool Calling	Context
LFM2 350M	Liquid AI	219 MB	~350 t/s	Basic	128K
Qwen3 0.6B	Alibaba Qwen	456 MB	~250 t/s	Basic	32K
Qwen3.5 0.8B	Alibaba Qwen	600 MB	~220 t/s	Basic	32K
LFM2 1.2B Tool ⭐	Liquid AI	731 MB	~180 t/s	Excellent	128K
LFM2.5 1.2B	Liquid AI	731 MB	~180 t/s	Good	128K
Qwen3.5 2B	Alibaba Qwen	1.2 GB	~150 t/s	Good	32K
LFM2 2.6B	Liquid AI	1.5 GB	~120 t/s	Good	128K
Qwen3 4B	Alibaba Qwen	2.5 GB	~80 t/s	Good	32K
Qwen3.5 4B 🏆	Alibaba Qwen	2.7 GB	~75 t/s	Excellent	262K

Model	Provider	Size	Speed	Tool Calling	License
LFM2 350M	Liquid AI	219 MB	~350 t/s	Basic	LFM Open
Qwen3 0.6B	Alibaba Qwen	456 MB	~250 t/s	Basic	Apache 2.0
Qwen3.5 0.8B	Alibaba Qwen	600 MB	~220 t/s	Basic	Apache 2.0
LFM2 1.2B Tool ⭐	Liquid AI	731 MB	~180 t/s	Excellent	LFM Open
LFM2.5 1.2B	Liquid AI	731 MB	~180 t/s	Good	LFM Open
Qwen3.5 2B	Alibaba Qwen	1.2 GB	~150 t/s	Good	Apache 2.0
LFM2 2.6B	Liquid AI	1.5 GB	~120 t/s	Good	LFM Open
Qwen3 4B	Alibaba Qwen	2.5 GB	~80 t/s	Good	Apache 2.0
Qwen3.5 4B 🏆	Alibaba Qwen	2.7 GB	~75 t/s	Excellent	Apache 2.0

Model	Provider	Size	Speed	Tool Calling	Best For
LFM2 1.2B Tool ⭐	Liquid AI	731 MB	~180 t/s	Excellent	Default choice, fast + accurate
Qwen3.5 4B 🏆	Alibaba Qwen	2.7 GB	~75 t/s	Excellent	Best quality, native tool format
LFM2 2.6B	Liquid AI	1.5 GB	~120 t/s	Good	Conversational + tool calling
LFM2.5 1.2B	Liquid AI	731 MB	~180 t/s	Good	Next-gen Liquid model
Qwen3.5 2B	Alibaba Qwen	1.2 GB	~150 t/s	Good	All-rounder
Qwen3 4B	Alibaba Qwen	2.5 GB	~80 t/s	Good	Smart reasoning
Qwen3.5 0.8B	Alibaba Qwen	600 MB	~220 t/s	Basic	Fast, limited accuracy
Qwen3 0.6B	Alibaba Qwen	456 MB	~250 t/s	Basic	Ultra-fast, smallest
LFM2 350M	Liquid AI	219 MB	~350 t/s	Basic	Fastest inference

⭐ = Default model (ships with rcli setup)
🏆 = Recommended upgrade for best quality

Model Details

Liquid AI LFM2 Family

Liquid AI’s LFM (Liquid Foundation Models) use a novel architecture optimized for efficiency and long context.

LFM2 1.2B Tool (Default)

Best for: Tool calling + conversation. Excellent balance of speed and accuracy.

Size: 731 MB (Q4_K_M quantization)
Speed: ~180 tokens/sec on Apple M3 Max
Tool Calling: Excellent (native <|tool_call_start|> format)
Context: 128K tokens
License: LFM Open
Download: rcli setup (default) or rcli models

Key Features:

Trained specifically for function calling
Uses native tool format: <|tool_call_start|>[func(arg="val")]<|tool_call_end|>
High accuracy on macOS actions (43 supported actions)
Fast inference with long context support

LFM2 350M

Size: 219 MB
Speed: ~350 tokens/sec (fastest model)
Tool Calling: Basic
Context: 128K tokens
Best for: Ultra-fast responses, resource-constrained devices

LFM2.5 1.2B Instruct

Size: 731 MB
Speed: ~180 tokens/sec
Tool Calling: Good (chat template format)
Context: 128K tokens
Best for: Next-generation Liquid model with improved conversational quality

LFM2 2.6B

Size: 1.5 GB
Speed: ~120 tokens/sec
Tool Calling: Good
Context: 128K tokens
Best for: Stronger conversational ability, complex reasoning

Alibaba Qwen3 Family

Qwen models are open-source LLMs from Alibaba with strong multilingual and tool-calling capabilities.

Qwen3 0.6B

Size: 456 MB
Speed: ~250 tokens/sec
Tool Calling: Basic (limited accuracy)
Context: 32K tokens
License: Apache 2.0
Best for: Ultra-fast inference, smallest footprint

Qwen3.5 0.8B

Size: 600 MB
Speed: ~220 tokens/sec
Tool Calling: Basic
Context: 32K tokens
Best for: Qwen3.5 generation, slightly better than 0.6B

Qwen3.5 2B

Size: 1.2 GB
Speed: ~150 tokens/sec
Tool Calling: Good
Context: 32K tokens
Best for: Solid all-rounder for tool calling + conversations

Qwen3 4B

Size: 2.5 GB
Speed: ~80 tokens/sec
Tool Calling: Good
Context: 32K tokens
Best for: Smart reasoning, needs more RAM

Qwen3.5 4B (Recommended)

Best overall quality. Native tool calling with 262K context window.

Size: 2.7 GB
Speed: ~75 tokens/sec
Tool Calling: Excellent (native <tool_call> XML format)
Context: 262K tokens
License: Apache 2.0
Download: rcli upgrade-llm

Key Features:

Best small model for complex tasks
Native tool calling with <tool_call> / </tool_call> tags
Massive 262K context window (vs. 32K in Qwen3 4B)
High accuracy on RCLI’s 43 macOS actions

Tool Calling Formats

Different model families use different tool-calling formats. RCLI automatically detects and parses each format.

Qwen3 Format (JSON-based)

&lt;tool_call&gt;
{"name": "open_app", "arguments": {"app_name": "Safari"}}
&lt;/tool_call&gt;

LFM2 Format (Function-call syntax)

&lt;|tool_call_start|&gt;[open_app(app_name="Safari")]&lt;|tool_call_end|&gt;

Both formats are supported natively by RCLI’s tool-calling engine.

Quantization

All models use Q4_K_M quantization (4-bit with mixed precision):

Balances quality and size (~70% size reduction vs. FP16)
Minimal accuracy loss (<3% perplexity increase)
Fast inference on Metal GPU
Compatible with llama.cpp Metal backend

Benchmarks

Performance measured on Apple M3 Max (14-core CPU, 30-core GPU, 36 GB RAM):

Speed Benchmarks

rcli bench --suite llm            # Benchmark active LLM
rcli bench --all-llm --suite llm  # Compare all installed LLMs

Model	Time to First Token	Generation Speed	Tokens/128ms
LFM2 350M	18 ms	350 t/s	44 tokens
Qwen3 0.6B	19 ms	250 t/s	32 tokens
LFM2 1.2B	22 ms	180 t/s	23 tokens
Qwen3.5 2B	24 ms	150 t/s	19 tokens
LFM2 2.6B	28 ms	120 t/s	15 tokens
Qwen3.5 4B	30 ms	75 t/s	9 tokens

Tool Calling Accuracy

rcli bench --suite tools            # Benchmark active model
rcli bench --all-llm --suite tools  # Compare all models

Model	Accuracy (43 actions)	Avg Latency
Qwen3.5 4B	98.5%	180 ms
LFM2 1.2B Tool	97.2%	140 ms
LFM2 2.6B	95.8%	190 ms
Qwen3.5 2B	93.1%	160 ms
Qwen3 4B	92.4%	200 ms
LFM2.5 1.2B	91.7%	145 ms

Accuracy measured on RCLI’s 43 macOS actions test suite. Latency includes LLM inference + tool execution.

Switching Models

See Switching Models for hot-swap instructions and CLI commands.

Next Steps

Upgrade Your LLM

Guided LLM upgrade with size/speed comparisons

Benchmark Models

Run comprehensive benchmarks on all installed models

Switch Models

Hot-swap models without restarting RCLI

Model Browser

Interactive model management in TUI

Get Started

Core Features

Commands

Models

Actions

Advanced

Development

Quick Comparison

Model Details

Liquid AI LFM2 Family

LFM2 1.2B Tool (Default)

LFM2 350M

LFM2.5 1.2B Instruct

LFM2 2.6B

Alibaba Qwen3 Family

Qwen3 0.6B

Qwen3.5 0.8B

Qwen3.5 2B

Qwen3 4B

Qwen3.5 4B (Recommended)

Tool Calling Formats

Qwen3 Format (JSON-based)

LFM2 Format (Function-call syntax)

Quantization

Benchmarks

Speed Benchmarks

Tool Calling Accuracy

Switching Models

Next Steps

Upgrade Your LLM

Benchmark Models

Switch Models

Model Browser

Build docs developers (and LLMs) love

Get Started

Core Features

Commands

Models

Actions

Advanced

Development

​Quick Comparison

​Model Details

​Liquid AI LFM2 Family

​LFM2 1.2B Tool (Default)

​LFM2 350M

​LFM2.5 1.2B Instruct

​LFM2 2.6B

​Alibaba Qwen3 Family

​Qwen3 0.6B

​Qwen3.5 0.8B

​Qwen3.5 2B

​Qwen3 4B

​Qwen3.5 4B (Recommended)

​Tool Calling Formats

​Qwen3 Format (JSON-based)

​LFM2 Format (Function-call syntax)

​Quantization

​Benchmarks

​Speed Benchmarks

​Tool Calling Accuracy

​Switching Models

​Next Steps

Upgrade Your LLM

Benchmark Models

Switch Models

Model Browser

Build docs developers (and LLMs) love

Quick Comparison

Model Details

Liquid AI LFM2 Family

LFM2 1.2B Tool (Default)

LFM2 350M

LFM2.5 1.2B Instruct

LFM2 2.6B

Alibaba Qwen3 Family

Qwen3 0.6B

Qwen3.5 0.8B

Qwen3.5 2B

Qwen3 4B

Qwen3.5 4B (Recommended)

Tool Calling Formats

Qwen3 Format (JSON-based)

LFM2 Format (Function-call syntax)

Quantization

Benchmarks

Speed Benchmarks

Tool Calling Accuracy

Switching Models

Next Steps