Skip to main content

Basic flow

1

Install LLM Checker

npm install -g llm-checker
2

Detect your hardware

Run hw-detect to inspect your CPU, GPU, available memory, and the best inference backend for your system.
llm-checker hw-detect
Summary:
  Apple M4 Pro (24GB Unified Memory)
  Tier: MEDIUM HIGH
  Max model size: 15GB
  Best backend: metal

CPU:
  Apple M4 Pro
  Cores: 12 (12 physical)
  SIMD: NEON

Metal:
  GPU Cores: 16
  Unified Memory: 24GB
  Memory Bandwidth: 273GB/s
On hybrid or integrated-only systems, hw-detect also surfaces GPU topology explicitly:
Dedicated GPUs: NVIDIA GeForce RTX 4060
Integrated GPUs: Intel Iris Xe Graphics
Assist path: Integrated/shared-memory GPU detected, runtime remains CPU
3

Get model recommendations

Run recommend to see the top-ranked models for each category (coding, reasoning, multimodal, and more) based on your hardware profile.
llm-checker recommend --category coding
INTELLIGENT RECOMMENDATIONS BY CATEGORY
Hardware Tier: HIGH | Models Analyzed: 205

Coding:
   qwen2.5-coder:14b (14B)
   Score: 78/100
   Fine-tuning: LoRA+QLoRA
   Command: ollama pull qwen2.5-coder:14b

Reasoning:
   deepseek-r1:14b (14B)
   Score: 86/100
   Fine-tuning: QLoRA
   Command: ollama pull deepseek-r1:14b

Multimodal:
   llama3.2-vision:11b (11B)
   Score: 83/100
   Fine-tuning: LoRA+QLoRA
   Command: ollama pull llama3.2-vision:11b
Use the --category flag to focus on a specific use case: coding, reasoning, multimodal, or general. You can also steer ranking by optimization profile with --optimize speed, --optimize quality, --optimize balanced, or --optimize context.
4

Pull a model and run it

Copy the ollama pull command from the recommendation output, then use ai-run to prompt it directly:
# Pull the recommended model
ollama pull qwen2.5-coder:14b

# Run a prompt with auto model selection
llm-checker ai-run --category coding --prompt "Write a hello world in Python"
If you have already calibrated routing, pass the --calibrated flag to use your policy file:
llm-checker ai-run --calibrated --category coding --prompt "Refactor this function"

Calibration quick start (10 minutes)

Calibration generates routing policy artifacts from a prompt suite so that recommend and ai-run can make deterministic, measured decisions instead of relying solely on hardware heuristics.
1

Copy the sample prompt suite

cp ./docs/fixtures/calibration/sample-suite.jsonl ./sample-suite.jsonl
2

Generate calibration artifacts

Run calibrate in dry-run mode to produce both a calibration contract and a routing policy without executing live model calls:
mkdir -p ./artifacts
llm-checker calibrate \
  --suite ./sample-suite.jsonl \
  --models qwen2.5-coder:7b llama3.2:3b \
  --runtime ollama \
  --objective balanced \
  --dry-run \
  --output ./artifacts/calibration-result.json \
  --policy-out ./artifacts/calibration-policy.yaml
This creates two files:
  • ./artifacts/calibration-result.json — calibration contract
  • ./artifacts/calibration-policy.yaml — routing policy for use with recommend and ai-run
3

Apply calibrated routing

Pass the generated policy file to recommend or ai-run with the --calibrated flag:
llm-checker recommend --calibrated ./artifacts/calibration-policy.yaml --category coding
llm-checker ai-run --calibrated ./artifacts/calibration-policy.yaml --category coding --prompt "Refactor this function"
Flag precedence: --policy <file> takes precedence over --calibrated [file]. If you omit the path from --calibrated, discovery defaults to ~/.llm-checker/calibration-policy.{yaml,yml,json}.

Build docs developers (and LLMs) love