Guide to running, comparing, and writing benchmarks for Apache Arrow to measure and track performance
Apache Arrow includes comprehensive benchmarking infrastructure to measure and track performance across releases. This guide covers how to run benchmarks, compare results, and write effective benchmarks.
export CC=clang-8 CXX=clang++8archery benchmark run --cmake-extras="-DARROW_SIMD_LEVEL=NONE"
4
Use existing build directory
archery benchmark run $HOME/arrow/cpp/release-build
For meaningful benchmark results, always build in Release mode to enable compiler optimizations. Debug builds will produce misleading performance numbers.
You can also run C++ benchmark executables directly.
Each benchmark executable supports command-line options from Google Benchmark:
# Run specific benchmarks matching a regex./build/release/arrow-compute-benchmark --benchmark_filter=FloatParsing# Run with repetitions for statistical significance./build/release/arrow-compute-benchmark --benchmark_repetitions=10# Save results to JSON./build/release/arrow-compute-benchmark --benchmark_out=results.json
Reuse build directories to avoid rebuilding from scratch:
# First invocation: clone and build in temporary directoryarchery benchmark diff --preserve# Modify C++ sources...# Re-run benchmark using existing build directoriesarchery benchmark diff /tmp/arrow-bench*/{WORKSPACE,master}/build
Save baseline results to avoid re-running benchmarks:
# Run benchmarks once and save resultsarchery benchmark run --output=run-head-1.json HEAD~1# Compare against cached resultsarchery benchmark diff HEAD run-head-1.json
# Filter by suite and benchmark name (supports regex)archery benchmark diff \ --suite-filter=compute-aggregate \ --benchmark-filter=Kernel \ /tmp/arrow-bench*/{WORKSPACE,master}/build
Combine all three techniques for fastest iteration during benchmark development.
The benchmark command runs with --benchmark_repetitions=K for statistical significance. Don’t override this in your benchmark definition:
// GoodBENCHMARK(RegressionStringParsing);// Bad - don't override repetitionsBENCHMARK(RegressionStringParsing)->Repetitions(10);
3. Keep benchmarks fast
Benchmarks should run sufficiently fast. If input doesn’t fit in L2/L3 cache, the benchmark becomes memory-bound instead of CPU-bound. Consider downsizing inputs.
4. Use appropriate time metric
Google Benchmark defaults to CPU time (sum of time on CPU for all threads). For single-threaded benchmarks, this is preferable as it’s less affected by context switching.For multi-threaded benchmarks, use real time instead:
Arrow uses Conbench for continuous benchmarking across releases. This tracks performance over time and automatically detects regressions.Benchmark results from CI are automatically uploaded to Conbench, where you can: