Skip to main content
  • Added first-class integrated GPU inventory handling:
    • Unified hardware summaries now preserve integrated and dedicated GPU topology separately.
    • Summary metadata now exposes integrated/dedicated GPU counts and model lists.
  • Improved hybrid and integrated-only system reporting:
    • Hybrid systems now keep both dedicated and integrated GPU models visible.
    • Integrated-only systems continue to surface GPU inventory even when the runtime backend remains CPU.
  • Improved downstream model selection heuristics:
    • Recommendation, tiering, and token-speed estimation now prefer canonical integrated-GPU signals over scattered regex-only checks.
  • Improved CLI/system output:
    • Hardware displays now show dedicated vs integrated GPU inventory explicitly.
    • CPU-backend systems with integrated GPU assist paths are labeled more clearly.
  • Added regression coverage:
    • Hybrid dedicated + integrated inventory preservation tests.
    • Integrated-only CPU-backend inventory preservation tests.
  • Added Termux / Android package support:
    • npm package metadata now accepts the android platform so global installs work in Termux.
  • Improved Linux-compatible runtime handling for Termux:
    • Normalized Android platform detection to Linux-style hardware analysis where appropriate.
    • Added Termux-specific Ollama install hints (pkg install ollama, ollama serve).
  • Added regression coverage:
    • Android platform normalization and Termux runtime install command tests.
  • Fixed Linux hybrid GPU detection fallback:
    • Added lspci-based discovery when primary hardware libraries miss discrete GPUs.
    • Improved fallback enrichment so dedicated GPUs are surfaced even when the primary backend resolves to CPU.
  • Fixed AMD ROCm VRAM normalization:
    • Corrected rocm-smi unit parsing (B, KiB, MiB, GiB) to prevent overreported memory values.
  • Added fine-tuning suitability output in model selection workflows:
    • check, recommend, and ai-check now include a Fine-tuning indicator.
    • Labels include Full+LoRA+QLoRA, LoRA+QLoRA, QLoRA, and no-support states.
  • Added regression coverage:
    • ROCm VRAM parsing tests.
    • Fine-tuning support classification tests.
    • Linux hybrid GPU parsing and detector enrichment regression tests.
  • Added interactive panel mode when running llm-checker with no arguments on TTY terminals:
    • Startup animated banner.
    • Main command list with descriptions.
    • / opens full command list.
    • Keyboard navigation with up/down + Enter to execute.
    • Command filtering while typing in slash mode.
  • Added argument capture flow from interactive panel:
    • Required prompt for search <query>.
    • Optional free-form extra parameters for any selected command (for example --json --limit 5).
  • Replaced large per-command ASCII banners with a minimal, consistent command header style.
  • Kept direct non-interactive command invocation unchanged (llm-checker <command> ...).
  • Added helper regression coverage for interactive panel internals (tests/cli-interactive-panel.test.js).
  • Fixed Jetson/CUDA driver display fallback:
    • hw-detect now reports Driver: unknown instead of Driver: null when driver metadata is unavailable.
  • Hardened Jetson driver version detection:
    • Probes additional driver sources and parsing patterns (/proc/driver/nvidia/version, /sys/module/nvidia/version).
  • Fixed CUDA hardware fingerprint normalization:
    • Prevents malformed fingerprints containing duplicate hyphens (for example cuda--jetson-orin-nano-6gb).
  • Added Jetson regression coverage:
    • Driver fallback assertion and fingerprint sanitization checks in tests/cuda-jetson-detection.test.js.
  • Updated install channel docs:
    • npm unscoped package (llm-checker) explicitly marked as the recommended latest channel.
    • Scoped GitHub Packages channel marked legacy/may-lag with recovery steps for stale installs.
  • Added new ollama-plan command to generate safe Ollama runtime settings from local models + detected hardware.
  • Planner output includes:
    • Recommended OLLAMA_NUM_CTX
    • Recommended OLLAMA_NUM_PARALLEL
    • Recommended OLLAMA_MAX_LOADED_MODELS
    • Queue/keep-alive/flash-attention environment variables
    • Fallback profile and memory risk scoring
  • Added model selection handling by exact tag/family/partial match for planning input.
  • Added planner unit coverage (tests/ollama-capacity-planner.test.js).
  • Extended CLI smoke coverage to include ollama-plan --help.
Calibrated routing is now first-class in recommend and ai-run.Highlights:
  • --calibrated [file] support with default discovery path.
  • Clear precedence: --policy > --calibrated > deterministic fallback.
  • Routing provenance output (source, route, selected model).
Changes:
  • Added a calibration quick-start flow in README.md designed for first-time setup in under 10 minutes.
  • Added docs fixtures for calibration onboarding:
    • docs/fixtures/calibration/sample-suite.jsonl
    • docs/fixtures/calibration/sample-generated-policy.yaml
    • docs/fixtures/calibration/README.md
  • Added deterministic end-to-end test coverage for calibrate --policy-out ...recommend --calibrated ....
  • Hardened Jetson CUDA detection to prevent CPU-only fallback on valid Jetson/L4T systems.
  • Documentation reorganized under docs/ with clearer onboarding paths.
Known limitations:
  • calibrate --mode full currently supports --runtime ollama only.
  • Routing selection in recommend/ai-run still falls back to deterministic selection when calibrated policy is missing/invalid or when route models are unavailable.
  • Calibration suite quality checks are optional in dry-run and contract-only modes and do not execute runtime validation.
  • Added calibrated routing integration to recommend and ai-run:
    • New --calibrated [file] option with default discovery at ~/.llm-checker/calibration-policy.{yaml,yml,json}.
    • --policy takes precedence over --calibrated for route resolution.
    • Deterministic selector fallback when calibrated routing is unavailable.
  • recommend now supports dual policy behavior:
    • Enterprise governance policy (policy.yaml) remains supported.
    • Calibration routing policy can be provided via --policy or --calibrated.
  • ai-run now accepts calibrated routing options and can select an installed model directly from calibrated primary/fallback routes before AI selector fallback.
  • Added calibrated routing provenance output (policy source + resolved task/route/selected model).
  • Added calibration routing integration tests and fixtures.
  • Fixed false multimodal recommendations caused by noisy input_types metadata (coding models incorrectly marked as image-capable by upstream scraping noise).
  • Hardened modality inference: input_types=image alone is no longer sufficient; recommendation logic now also requires explicit multimodal metadata or strong vision naming/context hints.
  • Added deterministic regression coverage to ensure coding-only models are excluded from multimodal picks when metadata is ambiguous.
  • Replaced MIT license with NPDL-1.0 (No Paid Distribution License).
  • New license terms allow free use/modification/redistribution but prohibit paid distribution or paid hosted/API delivery without a separate commercial license.
  • Updated package metadata and README license badges/section.
  • Enforce feasible 30B-class coverage for capable discrete multi-GPU profiles (non-speed objectives).
  • Added deterministic regression for dual-GPU 36GB aggregate VRAM scenarios.
  • Preserve heterogeneous multi-GPU inventory summaries (for example, mixed V100/P40/M40).
  • Hardware mapping/fallbacks:
    • Added AMD Radeon AI PRO R9700 (PCI ID 7551) support path.
    • Added NVIDIA GTX 1070 Ti (1b82) fallback mapping.
    • Re-verified Linux RX 7900 XTX non-ROCm fallback detection path.
  • Fixed active-parameter memory path for MoE models in deterministic model selection.
  • Added deterministic regression coverage for MoE active/fallback parameter handling.
  • Improved deterministic recommendation stability for memory-fit edge cases.
  • Fixed TPS overestimation (was 2–10x across all hardware).
  • Updated speed coefficients to match real Ollama benchmarks:
    • H100: 120 TPS (was 400)
    • RTX 4090: 70 TPS (was 260)
    • M4 Pro: 45 TPS (was 270)
    • CPU: 5 TPS (was 50)
  • Changed quantization baseline from FP16 to Q4_K_M (the most common format).
  • Added diminishing returns for small models (1–3B do not scale linearly).
  • Added comprehensive hardware simulation test suite (17 test cases).
  • Security: Removed insecure curl | sh install instructions from CLI messages and setup script. Now references official docs/package managers.
  • Network hardening: Added request timeouts and a 5 MB response size limit in the Ollama native scraper to prevent hanging connections and excessive memory use.
  • Safer caching: Moved Ollama cache to ~/.llm-checker/cache/ollama with backward-compatible reads from the legacy src/ollama/.cache folder.
  • CLI updates: Adjusted CLI to read the new cache location with fallback to legacy path.
  • No breaking changes: Functionality remains the same; legacy cache is still read. On write, new cache path is used.
Previous version in repository.

Build docs developers (and LLMs) love