What the MCP Server Provides
The MCP server exposes the full LLM Checker surface as structured tools that Claude can call autonomously:- Hardware detection and tier analysis
- Model compatibility scoring and ranked recommendations
- Ollama model management (list, pull, run, remove)
- Benchmarking, comparison, and optimization tools
- Policy validation and audit export
- Calibration artifact generation
- Direct CLI execution for any allowlisted command
Setup
- Global install (recommended)
- npx (no global install)
Generate the Setup Command
If you want the exact command for your environment printed to stdout (for scripting or manual config file editing), run:claude mcp add command and the corresponding JSON config snippet. Useful flags:
| Flag | Effect |
|---|---|
--apply | Run the setup command automatically |
--json | Output config as JSON only |
--npx | Use npx transport instead of global binary |
Available MCP Tools
Core Analysis
hw_detect
Detect your hardware — CPU, GPU, RAM, and acceleration backend (Metal, CUDA, ROCm, CPU).
check
Full compatibility analysis with all models ranked by score.
recommend
Top model picks by category: coding, reasoning, multimodal, and more.
installed
Rank your already-downloaded Ollama models by compatibility score.
search
Search the Ollama model catalog with filters for family, quantization, size, and use-case.
smart_recommend
Advanced recommendations using the full 4D scoring engine.
ollama_plan
Build a capacity plan for local models with recommended
NUM_CTX, NUM_PARALLEL, and memory settings.ollama_plan_env
Return ready-to-paste
export ... env vars from the recommended or fallback plan profile.policy_validate
Validate a policy file against the v1 schema and return structured validation output.
audit_export
Run policy compliance export (
json/csv/sarif/all) for check or recommend flows.calibrate
Generate calibration artifacts from a prompt suite with typed MCP inputs.
Ollama Management
ollama_list
List all downloaded models with params, quantization, family, and size.
ollama_pull
Download a model from the Ollama registry.
ollama_run
Run a prompt against a local model and receive tok/s metrics alongside the response.
ollama_remove
Delete a model to free disk space.
Advanced (MCP-exclusive)
These tools are only available through the MCP server and have no direct CLI equivalent.ollama_optimize
Generate optimal Ollama env vars for your hardware —
NUM_GPU, NUM_PARALLEL, FLASH_ATTENTION, and more.benchmark
Benchmark a model with 3 standardized prompts, measuring tok/s, load time, and prompt eval.
compare_models
Head-to-head comparison of two models on the same prompt with speed and response side-by-side.
cleanup_models
Analyze installed models — find redundancies, cloud-only models, oversized models, and upgrade candidates.
project_recommend
Scan a project directory (languages, frameworks, size) and recommend the best model for that codebase.
ollama_monitor
Real-time system status: RAM usage, loaded models, and memory headroom analysis.
cli_help
List all allowlisted CLI commands exposed through MCP.
cli_exec
Execute any allowlisted
llm-checker CLI command with custom args (policy, audit, calibrate, sync, ai-run, etc.).Example Claude Prompts
After setup, ask Claude things like:Hardware and model selection
Hardware and model selection
- “What’s the best coding model for my hardware?”
- “What model should I use for this Rust project?”
- “Do you see both my iGPU and dGPU?”
Benchmarking and comparison
Benchmarking and comparison
- “Benchmark qwen2.5-coder and show me the tok/s”
- “Compare llama3.2 vs codellama for coding tasks”
Ollama management
Ollama management
- “Clean up my Ollama — what should I remove?”
- “Optimize my Ollama config for maximum performance”
- “How much RAM is Ollama using right now?”

