MeanScore computes the arithmetic mean of a named score field over every EvalRow in a run. It is the primary quality signal for most benchmarks.
Constructor parameters
Name of the score key to average. Must match a key emitted by an evaluator’s
score() method — for example "f1", "exact_match", or "mc_accuracy".Return value
compute() returns a dict[str, float] with the following key:
Mean value of
score_field across all examples. Range [0.0, 1.0] for
standard evaluators. Returns 0.0 when the row list is empty.Usage
When it is enabled
MeanScore is included in every CLI run by default. The CLI flag --score-field
controls which field it reads (default: f1).
score_field must match a key produced by an evaluator that is also registered
for the run. If the field is absent on a row, that row contributes 0.0 to the
mean.