Skip to main content
LLM Checker

Why LLM Checker?

Choosing the right LLM for your hardware is complex. With thousands of model variants, quantization levels, and hardware configurations, finding the optimal model requires deep understanding of memory bandwidth, VRAM limits, and performance characteristics. LLM Checker solves this. It analyzes your system, scores every compatible model across four dimensions, and delivers actionable recommendations in seconds — complete with ready-to-run ollama pull commands.

Quick Start

Get running in 2 minutes — detect your hardware and get your first model recommendation

Installation

Install via npm globally or run directly with npx — no build step required

Command Reference

Every command, flag, and option documented with real examples

MCP Integration

Use LLM Checker inside Claude Code via the built-in MCP server

Key Features

200+ Dynamic Models

Full scraped Ollama catalog with curated 35+ model fallback across all major families

4D Scoring Engine

Quality, Speed, Fit, Context — deterministic weights calibrated per use case

Multi-GPU Detection

Apple Silicon, NVIDIA CUDA, AMD ROCm, Intel Arc, and CPU backends

Zero Native Dependencies

Pure JavaScript — works on any Node.js 16+ system including Termux (Android)

MCP Server Built-in

Claude Code and other MCP-compatible assistants can analyze your hardware directly

Enterprise Policy

Governance rules, audit export to JSON/CSV/SARIF, CI/CD policy gates

Calibrated Routing

Generate routing policies from prompt suites and apply them to recommend and ai-run

Interactive CLI Panel

Animated banner, command picker, up/down navigation — or direct invocation for scripts

How It Works

LLM Checker uses a deterministic pipeline so the same inputs always produce the same ranked output:
  1. Hardware profiling — Detect CPU, GPU, RAM, and effective backend (Metal, CUDA, ROCm, CPU)
  2. Model pool assembly — Merge the full Ollama catalog (or curated fallback) with your locally installed models
  3. Candidate filtering — Keep only models relevant to your requested use case or category
  4. Fit selection — Pick the best quantization that fits in your available memory budget
  5. Deterministic scoring — Score each candidate across Quality, Speed, Fit, and Context
  6. Policy + ranking — Apply optional governance checks, rank, and return ollama pull commands

Supported Platforms

PlatformStatus
macOS (Apple Silicon M1–M4)Full Metal support
Linux (NVIDIA CUDA)Full CUDA support
Linux (AMD ROCm)Full ROCm support
Windows (NVIDIA / AMD)CUDA / ROCm support
Android (Termux)CPU backend
Any Node.js 16+ systemCPU fallback

Build docs developers (and LLMs) love