Latency reports statistical summaries of per-example latency recorded during a
benchmark run. Latency is measured as wall-clock time from when the system
receives an example to when it returns a result.
Constructor parameters
Latency takes no constructor parameters.
Return value
compute() returns a dict[str, float] with the following keys. All values are
in seconds.
Arithmetic mean of per-example latency in seconds.
Median (50th percentile) latency in seconds.
95th percentile latency in seconds. Useful for characterizing tail latency.
99th percentile latency in seconds. Reflects worst-case behavior under load.
Usage
When it is enabled
Latency is included in every CLI run by default.
When using
max_workers for concurrent evaluation, wall-clock time per example
reflects execution time of the system call itself, not queue wait time.