Skip to main content
Latency reports statistical summaries of per-example latency recorded during a benchmark run. Latency is measured as wall-clock time from when the system receives an example to when it returns a result.
from context_bench.metrics import Latency

Constructor parameters

Latency takes no constructor parameters.

Return value

compute() returns a dict[str, float] with the following keys. All values are in seconds.
latency_mean
float
Arithmetic mean of per-example latency in seconds.
latency_median
float
Median (50th percentile) latency in seconds.
latency_p95
float
95th percentile latency in seconds. Useful for characterizing tail latency.
latency_p99
float
99th percentile latency in seconds. Reflects worst-case behavior under load.

Usage

from context_bench import evaluate
from context_bench.evaluators import AnswerQuality
from context_bench.metrics import Latency

result = evaluate(
    systems=[my_system],
    dataset=my_dataset,
    evaluators=[AnswerQuality()],
    metrics=[Latency()],
)
summary = result.summary["my-system"]
print(summary["latency_mean"])    # e.g. 1.23
print(summary["latency_median"])  # e.g. 1.05
print(summary["latency_p95"])     # e.g. 2.87
print(summary["latency_p99"])     # e.g. 4.11

When it is enabled

Latency is included in every CLI run by default.
When using max_workers for concurrent evaluation, wall-clock time per example reflects execution time of the system call itself, not queue wait time.

Build docs developers (and LLMs) love