Skip to main content

Synopsis

llmfit fit [OPTIONS]

Description

Analyzes all models in the database and shows which ones are compatible with your system’s hardware. Models are scored and ranked based on how well they fit your available resources. This command provides classic table output (non-interactive) unlike the default TUI mode.

Options

-p, --perfect
boolean
default:"false"
Show only models that perfectly match recommended specs (fit level = Perfect).
-n, --limit
integer
Limit number of results displayed.
--sort
enum
default:"score"
Sort column for output. Options:
  • score - Composite ranking score (default)
  • tps - Estimated tokens/second (aliases: tokens, toks, throughput)
  • params - Model parameter count
  • mem - Memory utilization percentage (aliases: memory, mem_pct, utilization)
  • ctx - Context window length (alias: context)
  • date - Release date, newest first (aliases: release, released)
  • use - Use-case grouping (aliases: use_case, usecase)
--json
boolean
default:"false"
Output results as JSON instead of table format.
--memory
string
Override GPU VRAM size (e.g., “32G”, “32000M”, “1.5T”).
--max-context
integer
Cap context length used for memory estimation (tokens). Must be >= 1.

Usage Examples

Basic Fit Analysis

# Show all compatible models
llmfit fit

# Show system specs and top 10 models
llmfit fit -n 10

Filter by Fit Level

# Show only perfect fits
llmfit fit --perfect

# Show top 5 perfect fits
llmfit fit --perfect -n 5

Sort Options

# Sort by estimated tokens/second
llmfit fit --sort tps -n 10

# Sort by parameter count (largest first)
llmfit fit --sort params

# Sort by memory utilization (most efficient first)
llmfit fit --sort mem

# Sort by context window size
llmfit fit --sort ctx

# Sort by release date (newest first)
llmfit fit --sort date

Advanced Examples

# Top 5 models sorted by speed
llmfit fit --sort tps -n 5

# Perfect fits with 16K context cap
llmfit fit --perfect --max-context 16384

# Test with specific VRAM size
llmfit fit --memory 24G --sort tps

JSON Output

# Get fit results as JSON
llmfit fit --json -n 5

# Process with jq
llmfit fit --json | jq '.models[] | select(.fit_level == "perfect")'

Example Output

Table Format

╭─ System Hardware ──────────────────────────────────────────╮
│  RAM:  64.0 GB total (58.2 GB available)                  │
│  CPU:  16 cores (Apple M2 Max)                            │
│  GPU:  Metal - Apple M2 Max (64.0 GB, unified memory)     │
╰────────────────────────────────────────────────────────────╯

(12 models hidden — incompatible backend)

=== Model Compatibility Analysis ===
Found 37 compatible model(s)

╭─────────────┬──────────────────────┬───────────┬──────┬───────┬──────────────┬─────────┬────────────┬─────────┬────────┬─────────╮
│ Status      │ Model                │ Provider  │ Size │ Score │ tok/s est.   │ Quant   │ Runtime    │ Mode    │ Mem %  │ Context │
├─────────────┼──────────────────────┼───────────┼──────┼───────┼──────────────┼─────────┼────────────┼─────────┼────────┼─────────┤
│ ✓ Perfect   │ llama-3.3-70b        │ Meta      │ 70B  │ 95    │ 42.5         │ 4bit    │ MLX        │ GPU     │ 68.2%  │ 128k    │
│ ✓ Perfect   │ qwen-2.5-72b         │ Alibaba   │ 72B  │ 95    │ 40.1         │ 4bit    │ MLX        │ GPU     │ 71.5%  │ 32k     │
│ ✓ Good      │ deepseek-v3          │ DeepSeek  │ 671B │ 92    │ 28.3         │ Q4_K_M  │ llama.cpp  │ GPU     │ 89.7%  │ 128k    │
│ ✓ Perfect   │ qwen-2.5-coder-32b   │ Alibaba   │ 32B  │ 91    │ 68.2         │ 4bit    │ MLX        │ GPU     │ 52.3%  │ 128k    │
│ ✓ Perfect   │ llama-3.1-70b        │ Meta      │ 70B  │ 91    │ 42.5         │ 4bit    │ MLX        │ GPU     │ 68.2%  │ 128k    │
│ ✓ Perfect   │ codestral-25.01      │ Mistral   │ 22B  │ 89    │ 85.1         │ Q4_K_M  │ llama.cpp  │ GPU     │ 38.7%  │ 256k    │
│ ✓ Perfect   │ phi-4                │ Microsoft │ 14B  │ 87    │ 112.5        │ Q4_K_M  │ llama.cpp  │ GPU     │ 28.4%  │ 16k     │
│ ✓ Perfect   │ llama-3.2-3b         │ Meta      │ 3B   │ 82    │ 245.7        │ 4bit    │ MLX        │ GPU     │ 12.1%  │ 128k    │
╰─────────────┴──────────────────────┴───────────┴──────┴───────┴──────────────┴─────────┴────────────┴─────────┴────────┴─────────╯

  Note: tok/s values are baseline estimates; real runtime depends on engine/runtime.

JSON Format

{
  "system": {
    "total_ram_gb": 64.0,
    "available_ram_gb": 58.24,
    "cpu_cores": 16,
    "cpu_name": "Apple M2 Max",
    "has_gpu": true,
    "gpu_vram_gb": 64.0,
    "unified_memory": true,
    "backend": "Metal"
  },
  "models": [
    {
      "name": "llama-3.3-70b",
      "provider": "Meta",
      "parameter_count": "70B",
      "params_b": 70.0,
      "context_length": 131072,
      "use_case": "general",
      "category": "General",
      "fit_level": "perfect",
      "run_mode": "gpu",
      "score": 95.2,
      "score_components": {
        "quality": 95.0,
        "speed": 42.5,
        "fit": 100.0,
        "context": 100.0
      },
      "estimated_tps": 42.5,
      "runtime": "MLX",
      "best_quant": "4bit",
      "memory_required_gb": 43.68,
      "memory_available_gb": 64.0,
      "utilization_pct": 68.2
    }
  ]
}

Fit Levels

Models are classified into fit levels:
  • Perfect (✓): Model fits comfortably with recommended specs
  • Good (✓): Model fits but may be tight on resources
  • Marginal (⚠): Model fits but resources are constrained
  • Too Tight (✗): Model exceeds available resources (not shown by default)

Run Modes

  • GPU: Full model on GPU (best performance)
  • MoE Offload: MoE model with inactive experts in RAM
  • CPU Offload: Partial layers on CPU
  • CPU Only: Full model on CPU (slowest)

Score Components

The composite score includes:
  • Quality: Model capability and benchmark performance
  • Speed: Estimated tokens/second throughput
  • Fit: How well the model fits available resources
  • Context: Context window size advantage
  • llmfit - Launch interactive TUI
  • system - Show system specs
  • recommend - Get filtered recommendations
  • info - Detailed model information

Build docs developers (and LLMs) love