TokenEfficiencyMetric computes accuracy per token as a primary optimization target for the autoresearch loop. It produces a single number to maximize when trading off quality versus cost.
Constructor parameters
Name of the score key to use as quality. Must match a key emitted by an evaluator.
Formula
mean_score / (mean_input_tokens / 1000)) is also returned as token_efficiency_raw.
Return value
Adjusted score per token (F1-dominant). The primary optimization target — higher is better. Maximize this when tuning a retrieval pipeline.
Raw
mean_score / (mean_input_tokens / 1000). Kept for analysis; use token_efficiency for optimization.Mean value of
score_field across all rows.Mean input token count per row.
Mean ingest latency in seconds (from
row.metadata["ingest_latency"]). Relevant for memory system evaluations.Mean query latency in seconds (from
row.metadata["query_latency"]). Relevant for memory system evaluations.Usage
When it is enabled
TokenEfficiencyMetric is automatically used by the context-bench memory CLI subcommand. It is also used as the optimization target in the autoresearch loop.
