Quick Comparison
- By Speed
- By Size
- By Tool Calling
| Model | Provider | Size | Speed | Tool Calling | Context |
|---|---|---|---|---|---|
| LFM2 350M | Liquid AI | 219 MB | ~350 t/s | Basic | 128K |
| Qwen3 0.6B | Alibaba Qwen | 456 MB | ~250 t/s | Basic | 32K |
| Qwen3.5 0.8B | Alibaba Qwen | 600 MB | ~220 t/s | Basic | 32K |
| LFM2 1.2B Tool ⭐ | Liquid AI | 731 MB | ~180 t/s | Excellent | 128K |
| LFM2.5 1.2B | Liquid AI | 731 MB | ~180 t/s | Good | 128K |
| Qwen3.5 2B | Alibaba Qwen | 1.2 GB | ~150 t/s | Good | 32K |
| LFM2 2.6B | Liquid AI | 1.5 GB | ~120 t/s | Good | 128K |
| Qwen3 4B | Alibaba Qwen | 2.5 GB | ~80 t/s | Good | 32K |
| Qwen3.5 4B 🏆 | Alibaba Qwen | 2.7 GB | ~75 t/s | Excellent | 262K |
⭐ = Default model (ships with
🏆 = Recommended upgrade for best quality
rcli setup)🏆 = Recommended upgrade for best quality
Model Details
Liquid AI LFM2 Family
Liquid AI’s LFM (Liquid Foundation Models) use a novel architecture optimized for efficiency and long context.LFM2 1.2B Tool (Default)
Best for: Tool calling + conversation. Excellent balance of speed and accuracy.
- Size: 731 MB (Q4_K_M quantization)
- Speed: ~180 tokens/sec on Apple M3 Max
- Tool Calling: Excellent (native
<|tool_call_start|>format) - Context: 128K tokens
- License: LFM Open
- Download:
rcli setup(default) orrcli models
- Trained specifically for function calling
- Uses native tool format:
<|tool_call_start|>[func(arg="val")]<|tool_call_end|> - High accuracy on macOS actions (43 supported actions)
- Fast inference with long context support
LFM2 350M
- Size: 219 MB
- Speed: ~350 tokens/sec (fastest model)
- Tool Calling: Basic
- Context: 128K tokens
- Best for: Ultra-fast responses, resource-constrained devices
LFM2.5 1.2B Instruct
- Size: 731 MB
- Speed: ~180 tokens/sec
- Tool Calling: Good (chat template format)
- Context: 128K tokens
- Best for: Next-generation Liquid model with improved conversational quality
LFM2 2.6B
- Size: 1.5 GB
- Speed: ~120 tokens/sec
- Tool Calling: Good
- Context: 128K tokens
- Best for: Stronger conversational ability, complex reasoning
Alibaba Qwen3 Family
Qwen models are open-source LLMs from Alibaba with strong multilingual and tool-calling capabilities.Qwen3 0.6B
- Size: 456 MB
- Speed: ~250 tokens/sec
- Tool Calling: Basic (limited accuracy)
- Context: 32K tokens
- License: Apache 2.0
- Best for: Ultra-fast inference, smallest footprint
Qwen3.5 0.8B
- Size: 600 MB
- Speed: ~220 tokens/sec
- Tool Calling: Basic
- Context: 32K tokens
- Best for: Qwen3.5 generation, slightly better than 0.6B
Qwen3.5 2B
- Size: 1.2 GB
- Speed: ~150 tokens/sec
- Tool Calling: Good
- Context: 32K tokens
- Best for: Solid all-rounder for tool calling + conversations
Qwen3 4B
- Size: 2.5 GB
- Speed: ~80 tokens/sec
- Tool Calling: Good
- Context: 32K tokens
- Best for: Smart reasoning, needs more RAM
Qwen3.5 4B (Recommended)
Best overall quality. Native tool calling with 262K context window.
- Size: 2.7 GB
- Speed: ~75 tokens/sec
- Tool Calling: Excellent (native
<tool_call>XML format) - Context: 262K tokens
- License: Apache 2.0
- Download:
rcli upgrade-llm
- Best small model for complex tasks
- Native tool calling with
<tool_call>/</tool_call>tags - Massive 262K context window (vs. 32K in Qwen3 4B)
- High accuracy on RCLI’s 43 macOS actions
Tool Calling Formats
Different model families use different tool-calling formats. RCLI automatically detects and parses each format.Qwen3 Format (JSON-based)
LFM2 Format (Function-call syntax)
Quantization
All models use Q4_K_M quantization (4-bit with mixed precision):- Balances quality and size (~70% size reduction vs. FP16)
- Minimal accuracy loss (<3% perplexity increase)
- Fast inference on Metal GPU
- Compatible with llama.cpp Metal backend
Benchmarks
Performance measured on Apple M3 Max (14-core CPU, 30-core GPU, 36 GB RAM):Speed Benchmarks
| Model | Time to First Token | Generation Speed | Tokens/128ms |
|---|---|---|---|
| LFM2 350M | 18 ms | 350 t/s | 44 tokens |
| Qwen3 0.6B | 19 ms | 250 t/s | 32 tokens |
| LFM2 1.2B | 22 ms | 180 t/s | 23 tokens |
| Qwen3.5 2B | 24 ms | 150 t/s | 19 tokens |
| LFM2 2.6B | 28 ms | 120 t/s | 15 tokens |
| Qwen3.5 4B | 30 ms | 75 t/s | 9 tokens |
Tool Calling Accuracy
| Model | Accuracy (43 actions) | Avg Latency |
|---|---|---|
| Qwen3.5 4B | 98.5% | 180 ms |
| LFM2 1.2B Tool | 97.2% | 140 ms |
| LFM2 2.6B | 95.8% | 190 ms |
| Qwen3.5 2B | 93.1% | 160 ms |
| Qwen3 4B | 92.4% | 200 ms |
| LFM2.5 1.2B | 91.7% | 145 ms |
Accuracy measured on RCLI’s 43 macOS actions test suite. Latency includes LLM inference + tool execution.
Switching Models
See Switching Models for hot-swap instructions and CLI commands.Next Steps
Upgrade Your LLM
Guided LLM upgrade with size/speed comparisons
Benchmark Models
Run comprehensive benchmarks on all installed models
Switch Models
Hot-swap models without restarting RCLI
Model Browser
Interactive model management in TUI