PassRate

PassRate reports the proportion of evaluated examples that score at or above a defined quality threshold. Use it to answer “what percentage of responses were good enough?”

from context_bench.metrics import PassRate

Constructor parameters

threshold

float

default:"0.7"

Minimum score required for an example to count as a pass. An example passes when score >= threshold.

score_field

string

default:"score"

Name of the score key to evaluate. Must match a key emitted by an evaluator — for example "f1", "exact_match", or "mc_accuracy".

Return value

compute() returns a dict[str, float] with the following key:

pass_rate

float

Fraction of examples where score_field >= threshold. Range [0.0, 1.0]. Returns 0.0 when the row list is empty.

Usage

from context_bench import evaluate
from context_bench.evaluators import AnswerQuality
from context_bench.metrics import PassRate

result = evaluate(
    systems=[my_system],
    dataset=my_dataset,
    evaluators=[AnswerQuality()],
    metrics=[
        PassRate(threshold=0.7, score_field="f1"),
    ],
)
print(result.summary["my-system"]["pass_rate"])  # e.g. 0.61

PassRate(threshold=0.9, score_field="f1")

When it is enabled

PassRate is included in every CLI run. The --threshold flag controls the threshold value (default: 0.7) and --score-field controls which field is read (default: f1).

Combine PassRate with CostOfPass to understand both how often your system succeeds and how many tokens it spends per success.

Python API

Evaluators

Metrics

Datasets

Constructor parameters

Return value

Usage

When it is enabled

Build docs developers (and LLMs) love

Python API

Evaluators

Metrics

Datasets

​Constructor parameters

​Return value

​Usage

​When it is enabled

Build docs developers (and LLMs) love

Constructor parameters

Return value

Usage

When it is enabled