Overview
The benchmark module provides utilities for running repeated benchmarks with statistical confidence intervals and analyzing tabular metrics.Data Classes
BenchmarkResult
Contains the results of a repeated benchmark run with confidence intervals.Mean value of the measured metric across all runs
Standard deviation of the metric
Confidence interval margin for the metric
Mean latency in milliseconds across all runs
Standard deviation of latency in milliseconds
Confidence interval margin for latency in milliseconds
Number of benchmark runs executed
Confidence level used for interval calculation (e.g., 0.95 for 95%)
Functions
run_repeated_benchmark
Executes a function multiple times and returns benchmark statistics with confidence intervals.Function to benchmark. Must return a dictionary containing the metric specified by
metric_keyKey to extract from the function’s return dictionary for metric tracking
Number of times to run the function. Minimum of 2 runs will be enforced
Confidence level for interval calculation (0-1)
A BenchmarkResult object containing mean, standard deviation, and confidence interval margins for both the metric and latency measurements
benchmark_table_metrics
Computes confidence intervals for multiple metric columns in a DataFrame.DataFrame containing the metric columns to analyze
List of column names to compute statistics for
Confidence level for interval calculation (0-1)
Dictionary mapping each metric column name to a dictionary containing:
mean: Mean valuestd: Standard deviationci_margin: Confidence interval marginconfidence: Confidence level used