Skip to main content
LLM Checker includes a built-in Model Context Protocol (MCP) server. Once connected, Claude Code can detect your hardware, rank models, manage Ollama, run benchmarks, and execute any LLM Checker command — without leaving the Claude interface.

What the MCP Server Provides

The MCP server exposes the full LLM Checker surface as structured tools that Claude can call autonomously:
  • Hardware detection and tier analysis
  • Model compatibility scoring and ranked recommendations
  • Ollama model management (list, pull, run, remove)
  • Benchmarking, comparison, and optimization tools
  • Policy validation and audit export
  • Calibration artifact generation
  • Direct CLI execution for any allowlisted command

Setup

Generate the Setup Command

If you want the exact command for your environment printed to stdout (for scripting or manual config file editing), run:
llm-checker mcp-setup
This prints the claude mcp add command and the corresponding JSON config snippet. Useful flags:
FlagEffect
--applyRun the setup command automatically
--jsonOutput config as JSON only
--npxUse npx transport instead of global binary

Available MCP Tools

Core Analysis

hw_detect

Detect your hardware — CPU, GPU, RAM, and acceleration backend (Metal, CUDA, ROCm, CPU).

check

Full compatibility analysis with all models ranked by score.

recommend

Top model picks by category: coding, reasoning, multimodal, and more.

installed

Rank your already-downloaded Ollama models by compatibility score.

search

Search the Ollama model catalog with filters for family, quantization, size, and use-case.

smart_recommend

Advanced recommendations using the full 4D scoring engine.

ollama_plan

Build a capacity plan for local models with recommended NUM_CTX, NUM_PARALLEL, and memory settings.

ollama_plan_env

Return ready-to-paste export ... env vars from the recommended or fallback plan profile.

policy_validate

Validate a policy file against the v1 schema and return structured validation output.

audit_export

Run policy compliance export (json/csv/sarif/all) for check or recommend flows.

calibrate

Generate calibration artifacts from a prompt suite with typed MCP inputs.

Ollama Management

ollama_list

List all downloaded models with params, quantization, family, and size.

ollama_pull

Download a model from the Ollama registry.

ollama_run

Run a prompt against a local model and receive tok/s metrics alongside the response.

ollama_remove

Delete a model to free disk space.

Advanced (MCP-exclusive)

These tools are only available through the MCP server and have no direct CLI equivalent.

ollama_optimize

Generate optimal Ollama env vars for your hardware — NUM_GPU, NUM_PARALLEL, FLASH_ATTENTION, and more.

benchmark

Benchmark a model with 3 standardized prompts, measuring tok/s, load time, and prompt eval.

compare_models

Head-to-head comparison of two models on the same prompt with speed and response side-by-side.

cleanup_models

Analyze installed models — find redundancies, cloud-only models, oversized models, and upgrade candidates.

project_recommend

Scan a project directory (languages, frameworks, size) and recommend the best model for that codebase.

ollama_monitor

Real-time system status: RAM usage, loaded models, and memory headroom analysis.

cli_help

List all allowlisted CLI commands exposed through MCP.

cli_exec

Execute any allowlisted llm-checker CLI command with custom args (policy, audit, calibrate, sync, ai-run, etc.).

Example Claude Prompts

After setup, ask Claude things like:
  • “What’s the best coding model for my hardware?”
  • “What model should I use for this Rust project?”
  • “Do you see both my iGPU and dGPU?”
  • “Benchmark qwen2.5-coder and show me the tok/s”
  • “Compare llama3.2 vs codellama for coding tasks”
  • “Clean up my Ollama — what should I remove?”
  • “Optimize my Ollama config for maximum performance”
  • “How much RAM is Ollama using right now?”
Claude will automatically call the right tools and return actionable results.

Build docs developers (and LLMs) love