calibrate — Generate Calibration Artifacts

Overview

calibrate runs a JSONL prompt suite against one or more models and produces two artifacts:

calibration-result.json — the calibration contract with benchmark data, model metadata, and scoring
calibration-policy.yaml — a routing policy for use with --calibrated in recommend and ai-run

llm-checker calibrate \
  --suite ./sample-suite.jsonl \
  --models qwen2.5-coder:7b llama3.2:3b \
  --runtime ollama \
  --objective balanced \
  --dry-run \
  --output ./artifacts/calibration-result.json \
  --policy-out ./artifacts/calibration-policy.yaml

Flags

--suite

string

Required. Path to the JSONL prompt suite file. Each line is a JSON object describing a prompt and its task metadata.

--models

string[]

Required. One or more model identifiers to calibrate. Repeat the flag or comma-separate values:

--models qwen2.5-coder:7b llama3.2:3b
--models qwen2.5-coder:7b,llama3.2:3b

--output

string

Required. Output path for the calibration result. Accepts .json, .yaml, or .yml.

--runtime

string

Inference runtime for execution. Accepted values: ollama, vllm, mlx, llama.cpp.Default: ollama

--mode

string

Execution mode. Accepted values:

dry-run — produce draft artifacts without any benchmark execution
contract-only — build calibration contract without running prompts (default)
full — run all prompts against each model (requires --runtime ollama)

Default: contract-only

--objective

string

Calibration objective. Accepted values: balanced, speed, quality, coding, reasoning.Default: balanced

--dry-run

flag

Shorthand for --mode dry-run. Produces draft artifacts without running any prompts.

--policy-out

string

Optional output path for the routing policy artifact. Accepts .json, .yaml, or .yml. Required to use calibrated routing in recommend and ai-run.

--warmup

number

Number of warmup runs per prompt in full mode.Default: 1

--iterations

number

Number of measured iterations per prompt in full mode.Default: 2

--timeout-ms

number

Per-prompt timeout in milliseconds for full mode.Default: 120000

Calibration Quick Start

Copy the sample prompt suite

cp ./docs/fixtures/calibration/sample-suite.jsonl ./sample-suite.jsonl

Generate calibration artifacts (dry run)

mkdir -p ./artifacts
llm-checker calibrate \
  --suite ./sample-suite.jsonl \
  --models qwen2.5-coder:7b llama3.2:3b \
  --runtime ollama \
  --objective balanced \
  --dry-run \
  --output ./artifacts/calibration-result.json \
  --policy-out ./artifacts/calibration-policy.yaml

Apply calibrated routing

llm-checker recommend \
  --calibrated ./artifacts/calibration-policy.yaml \
  --category coding

llm-checker ai-run \
  --calibrated ./artifacts/calibration-policy.yaml \
  --category coding \
  --prompt "Refactor this function"

Example Output

CALIBRATION ARTIFACTS GENERATED
────────────────────────────────────────────────────────────────────────────────
Suite: ./sample-suite.jsonl
Runtime: ollama | Objective: balanced
Models: 2
Execution mode: dry-run
Result: ./artifacts/calibration-result.json
Policy: ./artifacts/calibration-policy.yaml

Artifacts

calibration-result.json

The calibration contract includes:

Model identifiers and runtime metadata
Suite metadata (prompt count, task distribution)
Per-model scoring and benchmark results (in full mode)
Objective and execution mode used

calibration-policy.yaml

The routing policy maps task categories to model selections. Example structure (see docs/fixtures/calibration/sample-generated-policy.yaml for the full schema):

version: 1
calibration_result: ./artifacts/calibration-result.json
routes:
  coding:
    primary: qwen2.5-coder:7b
    fallbacks:
      - llama3.2:3b
  general:
    primary: llama3.2:3b
    fallbacks: []

Policy Resolution Notes

--policy <file> always takes precedence over --calibrated [file] in recommend and ai-run.
When --calibrated has no path, auto-discovery checks ~/.llm-checker/calibration-policy.{yaml,yml,json}.
--mode full currently requires --runtime ollama.

llm-checker calibrate \
  --suite ./prompts.jsonl \
  --models qwen2.5-coder:7b \
  --mode full \
  --iterations 3 \
  --output ./calibration.json \
  --policy-out ./routing.yaml

Get Started

Command Reference

Configuration

Guides

Reference

calibrate — Generate Calibration Artifacts

Overview

Flags

Calibration Quick Start

Example Output

Artifacts

calibration-result.json

calibration-policy.yaml

Policy Resolution Notes

Build docs developers (and LLMs) love

Get Started

Command Reference

Configuration

Guides

Reference

​Overview

​Flags

​Calibration Quick Start

​Example Output

​Artifacts

​calibration-result.json

​calibration-policy.yaml

​Policy Resolution Notes

Build docs developers (and LLMs) love

Overview

Flags

Calibration Quick Start

Example Output

Artifacts

calibration-result.json

calibration-policy.yaml

Policy Resolution Notes