Skip to main content

Overview

recommend generates hardware-aware model recommendations across multiple categories simultaneously: Coding, Reasoning, Multimodal, Creative, Chat, and more. It uses the deterministic 4D scoring engine and optionally routes through a calibrated policy for measured routing decisions.
llm-checker recommend

Example Output

INTELLIGENT RECOMMENDATIONS BY CATEGORY
Hardware Tier: HIGH | Models Analyzed: 205

Coding:
   qwen2.5-coder:14b (14B)
   Score: 78/100
   Fine-tuning: LoRA+QLoRA
   Command: ollama pull qwen2.5-coder:14b

Reasoning:
   deepseek-r1:14b (14B)
   Score: 86/100
   Fine-tuning: QLoRA
   Command: ollama pull deepseek-r1:14b

Multimodal:
   llama3.2-vision:11b (11B)
   Score: 83/100
   Fine-tuning: LoRA+QLoRA
   Command: ollama pull llama3.2-vision:11b
Each recommendation includes a fine-tuning suitability label (Full FT, LoRA, QLoRA) to help select the right training path.

Flags

-c, --category
string
Narrow output to a single category. See the category table below for accepted values.
--optimize
string
Apply an optimization profile to steer ranking. Accepted values: balanced, speed, quality, context, coding.Default: balanced
--calibrated
string
Enable calibrated routing. Optionally provide a file path to a calibration policy. If omitted, discovery checks ~/.llm-checker/calibration-policy.{yaml,yml,json}.
--policy
string
Enterprise policy file path. Takes precedence over --calibrated.
--no-verbose
flag
Disable step-by-step progress output.
--simulate
string
Simulate a hardware profile. Use --simulate list to see all profiles.
--gpu
string
Custom GPU model for hardware simulation, e.g. "RTX 5060".
--ram
number
Custom RAM in GB for hardware simulation.
--cpu
string
Custom CPU model for hardware simulation.
--vram
number
Override GPU VRAM in GB. Requires --gpu.

Category Options

CategoryUse Case
codingProgramming, code generation, code review
reasoningComplex logic, math, multi-step problems
multimodalImage understanding, vision tasks
creativeCreative writing, storytelling
chatConversational AI, general chat
generalGeneral-purpose tasks
embeddingsSemantic search, RAG pipelines

Optimize Profiles

ProfileDescription
balancedEqual emphasis on quality and speed
speedMaximize tokens/sec, prefer smaller models
qualityMaximize model quality, accept slower inference
contextPrefer models with large context windows
codingEmphasize coding benchmark scores

Usage Examples

# All categories, balanced optimization
llm-checker recommend

# Best coding model
llm-checker recommend --category coding

# Speed-optimized recommendations
llm-checker recommend --optimize speed

# Quality-first reasoning models
llm-checker recommend --category reasoning --optimize quality

# Calibrated routing (auto-discover policy)
llm-checker recommend --calibrated --category coding

# Calibrated routing with explicit policy file
llm-checker recommend --calibrated ./artifacts/calibration-policy.yaml --category reasoning

# Enterprise policy enforcement
llm-checker recommend --policy ./policy.yaml --category coding

# Simulate different hardware
llm-checker recommend --simulate rtx4090 --category coding

Calibrated Routing

When --calibrated is active, routing decisions are sourced from a calibration-policy.yaml generated by the calibrate command. The output includes a CALIBRATED ROUTING block showing:
  • Policy file path and discovery source
  • Task name (and any task fallback used)
  • Route primary model and fallbacks
  • Selected model
Policy resolution precedence:
  1. --policy <file> (explicit enterprise policy)
  2. --calibrated <file> (explicit calibration policy file)
  3. --calibrated (auto-discovery from ~/.llm-checker/)
  4. Deterministic selector fallback
The fine-tuning label in output (LoRA, QLoRA, Full FT) reflects the suitability of the recommended model for fine-tuning workflows based on parameter count and quantization.

Build docs developers (and LLMs) love