Overview
The Hospital Data Analysis Platform provides robust benchmarking utilities with statistical confidence intervals. These tools enable reliable performance measurements that account for variance and provide confidence bounds.BenchmarkResult Dataclass
Benchmark results are returned as aBenchmarkResult dataclass (defined in evaluation/benchmark.py:10):
Interpreting Results
The confidence interval is expressed asmean ± margin:
- Metric:
metric_mean ± metric_ci_margin - Latency:
latency_mean_ms ± latency_ci_margin_ms
Repeated Benchmarks
Basic Usage
Therun_repeated_benchmark function (defined in evaluation/benchmark.py:21) runs a function multiple times and computes statistics:
Function Signature
Implementation Details
- Uses
time.perf_counter()for high-resolution timing - Ensures minimum 2 runs for statistical validity
- Computes confidence intervals for both metrics and latency
Table Metrics Benchmarking
Multi-Column Analysis
Thebenchmark_table_metrics function (defined in evaluation/benchmark.py:37) computes statistics across multiple DataFrame columns:
Function Signature
Return Format
Statistical Confidence Intervals
Implementation
Theconfidence_interval function (defined in evaluation/statistics.py:7) computes mean, standard deviation, and margin:
Supported Confidence Levels
| Confidence Level | Z-Score |
|---|---|
| 0.90 (90%) | 1.64 |
| 0.95 (95%) | 1.96 |
| 0.99 (99%) | 2.58 |
Direct Usage
Complete Benchmarking Example
End-to-End Performance Evaluation
Multi-Experiment Analysis
Best Practices
1. Use Sufficient Iterations
More benchmark runs provide tighter confidence intervals:2. Control for Variance
Reduce variance by:- Setting a consistent random seed
- Running benchmarks on idle systems
- Disabling background processes
- Using the same hardware and environment
3. Report Complete Statistics
Always report mean, standard deviation, and confidence interval:4. Save Benchmark Results
Interpreting Benchmarks
Comparing Results
Two benchmarks are statistically different if their confidence intervals don’t overlap:Performance Regression Detection
See Also
- Configuration - Benchmark configuration parameters
- Reproducibility - Ensuring consistent benchmark results
- Troubleshooting - Debugging benchmark issues