info

Synopsis

llmfit info <MODEL>

Description

Displays comprehensive information about a specific model including specifications, hardware requirements, fit analysis, score breakdown, and download sources. If multiple models match the query, you’ll be prompted to be more specific.

Arguments

model

string

required

Model name or partial name to look up. Case-insensitive.

Options

--json

boolean

default:"false"

Output model information as JSON instead of formatted text.

--memory

string

Override GPU VRAM size for fit analysis (e.g., “32G”, “32000M”, “1.5T”).

--max-context

integer

Cap context length used for memory estimation (tokens). Must be >= 1.

Usage Examples

Basic Model Info

# Get details about Llama 3.3 70B
llmfit info llama-3.3-70b

# Partial name match
llmfit info "llama 70b"

JSON Output

# Get model info as JSON
llmfit info llama-3.3-70b --json

# Process with jq
llmfit info deepseek-v3 --json | jq '.models[0].score_components'

Test Different Configurations

# Check fit with 24GB VRAM
llmfit info llama-3.1-70b --memory 24G

# Check fit with 8K context limit
llmfit info qwen-2.5-72b --max-context 8192

Multiple Model Info

# Compare multiple models
llmfit info llama-3.3-70b
llmfit info qwen-2.5-72b
llmfit info deepseek-v3

Example Output

Detailed Info

=== llama-3.3-70b ===

Provider: Meta
Parameters: 70B
Quantization: 4bit
Best Quant: 4bit
Context Length: 131072 tokens
Use Case: general
Category: General
Released: 2024-12-06
Runtime: MLX (baseline est. ~42.5 tok/s)

Score Breakdown:
  Overall Score: 95.2 / 100
  Quality: 95  Speed: 43  Fit: 100  Context: 100
  Baseline Est. Speed: 42.5 tok/s

Resource Requirements:
  Min VRAM: 36.4 GB
  Min RAM: 43.7 GB (CPU inference)
  Recommended RAM: 87.5 GB

Fit Analysis:
  Status: ✓ Perfect
  Run Mode: GPU
  Memory Utilization: 68.2% (43.7 / 64.0 GB)

GGUF Downloads:
  bartowski → https://huggingface.co/bartowski/Llama-3.3-70B-Instruct-GGUF
  Tip: llmfit download bartowski/Llama-3.3-70B-Instruct-GGUF --quant 4bit

Notes:
  High-quality general-purpose model from Meta
  Excellent for chat, reasoning, and code generation
  Requires significant VRAM for full GPU inference

MoE Model Info

$ llmfit info deepseek-v3

=== deepseek-v3 ===

Provider: DeepSeek
Parameters: 671B
Quantization: Q4_K_M
Best Quant: Q4_K_M
Context Length: 131072 tokens
Use Case: reasoning
Category: Reasoning
Released: 2024-12-26
Runtime: llama.cpp (baseline est. ~28.3 tok/s)

Score Breakdown:
  Overall Score: 92.1 / 100
  Quality: 98  Speed: 28  Fit: 89  Context: 100
  Baseline Est. Speed: 28.3 tok/s

Resource Requirements:
  Min VRAM: 350.2 GB
  Min RAM: 420.2 GB (CPU inference)
  Recommended RAM: 840.5 GB

MoE Architecture:
  Experts: 8 active / 256 total per token
  Active VRAM: 57.4 GB (vs 350.2 GB full model)
  Offloaded: 292.8 GB inactive experts in RAM

Fit Analysis:
  Status: ✓ Good
  Run Mode: MoE Offload
  Memory Utilization: 89.7% (57.4 / 64.0 GB)

GGUF Downloads:
  unsloth → https://huggingface.co/unsloth/DeepSeek-V3-GGUF
  Tip: llmfit download unsloth/DeepSeek-V3-GGUF --quant Q4_K_M

Notes:
  Advanced reasoning model with MoE architecture
  Sparse activation allows running on modest hardware
  GPU handles active experts, RAM holds inactive experts

Multiple Matches

$ llmfit info llama

Multiple models found. Please be more specific:
  - llama-3.3-70b
  - llama-3.1-405b
  - llama-3.1-70b
  - llama-3.1-8b
  - llama-3.2-3b
  - llama-3.2-1b
  - llama-3.2-11b-vision
  - llama-3.2-90b-vision

No Match

$ llmfit info nonexistent

No model found matching 'nonexistent'

JSON Format

$ llmfit info llama-3.3-70b --json

{
  "system": {
    "total_ram_gb": 64.0,
    "available_ram_gb": 58.24,
    "cpu_cores": 16,
    "gpu_vram_gb": 64.0,
    "backend": "Metal"
  },
  "models": [
    {
      "name": "llama-3.3-70b",
      "provider": "Meta",
      "parameter_count": "70B",
      "params_b": 70.0,
      "context_length": 131072,
      "use_case": "general",
      "category": "General",
      "release_date": "2024-12-06",
      "is_moe": false,
      "fit_level": "perfect",
      "run_mode": "gpu",
      "score": 95.2,
      "score_components": {
        "quality": 95.0,
        "speed": 42.5,
        "fit": 100.0,
        "context": 100.0
      },
      "estimated_tps": 42.5,
      "runtime": "MLX",
      "runtime_label": "MLX",
      "best_quant": "4bit",
      "memory_required_gb": 43.68,
      "memory_available_gb": 64.0,
      "utilization_pct": 68.2,
      "notes": [
        "High-quality general-purpose model",
        "Excellent for chat, reasoning, and code"
      ],
      "gguf_sources": [
        {
          "provider": "bartowski",
          "repo": "bartowski/Llama-3.3-70B-Instruct-GGUF"
        }
      ]
    }
  ]
}

Information Sections

Model Specs

Provider, parameter count, quantization
Context window size
Use case and category
Release date

Score Breakdown

Overall composite score (0-100)
Individual components: quality, speed, fit, context
Estimated tokens/second throughput

Resource Requirements

Minimum VRAM for GPU inference
Minimum RAM for CPU inference
Recommended RAM for comfortable operation

MoE Details (if applicable)

Expert configuration (active/total)
Active VRAM usage
Offloaded expert size

Fit Analysis

Fit level (Perfect, Good, Marginal, Too Tight)
Run mode (GPU, MoE Offload, CPU Offload, CPU Only)
Memory utilization percentage

Download Sources

HuggingFace GGUF repositories
Download command examples

search - Find models by name
fit - Show compatible models
plan - Plan hardware requirements
recommend - Get top recommendations

CLI Commands

REST API

Core Library

Synopsis

Description

Arguments

Options

Usage Examples

Basic Model Info

JSON Output

Test Different Configurations

Multiple Model Info

Example Output

Detailed Info

MoE Model Info

Multiple Matches

No Match

JSON Format

Information Sections

Model Specs

Score Breakdown

Resource Requirements

MoE Details (if applicable)

Fit Analysis

Download Sources

Build docs developers (and LLMs) love

CLI Commands

REST API

Core Library

​Synopsis

​Description

​Arguments

​Options

​Usage Examples

​Basic Model Info

​JSON Output

​Test Different Configurations

​Multiple Model Info

​Example Output

​Detailed Info

​MoE Model Info

​Multiple Matches

​No Match

​JSON Format

​Information Sections

​Model Specs

​Score Breakdown

​Resource Requirements

​MoE Details (if applicable)

​Fit Analysis

​Download Sources

​Related Commands

Build docs developers (and LLMs) love

Synopsis

Description

Arguments

Options

Usage Examples

Basic Model Info

JSON Output

Test Different Configurations

Multiple Model Info

Example Output

Detailed Info

MoE Model Info

Multiple Matches

No Match

JSON Format

Information Sections

Model Specs

Score Breakdown

Resource Requirements

MoE Details (if applicable)

Fit Analysis

Download Sources

Related Commands