Benchmarking Guide

Apache Arrow includes comprehensive benchmarking infrastructure to measure and track performance across releases. This guide covers how to run benchmarks, compare results, and write effective benchmarks.

Setup

Before running benchmarks, install the Archery utility:

pip install -e dev/archery[benchmark]

Archery is Arrow’s command-line tool for development tasks including running benchmarks.

Running Benchmarks

Using Archery
C++ Benchmarks Directly

The recommended way to run benchmarks is through Archery’s benchmark run command.

Basic Usage

Run benchmarks in current workspace

archery benchmark run

Save results to a file

archery benchmark run --output=run.json

This saves results for later comparison.

Run with custom CMake flags

export CC=clang-8 CXX=clang++8
archery benchmark run --cmake-extras="-DARROW_SIMD_LEVEL=NONE"

Use existing build directory

archery benchmark run $HOME/arrow/cpp/release-build

For meaningful benchmark results, always build in Release mode to enable compiler optimizations. Debug builds will produce misleading performance numbers.

You can also run C++ benchmark executables directly.

Building Benchmarks

Enable benchmark building with CMake:

cmake .. -DARROW_BUILD_BENCHMARKS=ON -DCMAKE_BUILD_TYPE=Release
make

Running Individual Benchmarks

./build/release/arrow-builder-benchmark

Each benchmark executable supports command-line options from Google Benchmark:

# Run specific benchmarks matching a regex
./build/release/arrow-compute-benchmark --benchmark_filter=FloatParsing

# Run with repetitions for statistical significance
./build/release/arrow-compute-benchmark --benchmark_repetitions=10

# Save results to JSON
./build/release/arrow-compute-benchmark --benchmark_out=results.json

Comparing Performance

One of the primary goals of benchmarking is detecting performance regressions. Archery provides a benchmark diff command for comparing results.

Basic Comparison

Compare workspace with main branch

By default, compares your current changes against the local main branch:

archery --quiet benchmark diff --benchmark-filter=FloatParsing

Example output:

-----------------------------------------------------------------------------------
Non-regressions: (1)
-----------------------------------------------------------------------------------
               benchmark            baseline           contender  change % counters
 FloatParsing<FloatType>  105.983M items/sec  105.983M items/sec       0.0       {}

------------------------------------------------------------------------------------
Regressions: (1)
------------------------------------------------------------------------------------
                benchmark            baseline           contender  change % counters
 FloatParsing<DoubleType>  209.941M items/sec  109.941M items/sec   -47.632       {}

Compare specific commits or branches

archery benchmark diff main feature-branch

Compare using saved results

archery benchmark run --output=baseline.json $HOME/arrow/cpp/release-build
git checkout some-feature
archery benchmark run --output=contender.json $HOME/arrow/cpp/release-build
archery benchmark diff contender.json baseline.json

For more options, run:

archery benchmark diff --help

Iterating Efficiently

Benchmark development can be tedious due to long build and run times. Use these techniques to reduce overhead:

1. Preserve Build Directories

Reuse build directories to avoid rebuilding from scratch:

# First invocation: clone and build in temporary directory
archery benchmark diff --preserve

# Modify C++ sources...

# Re-run benchmark using existing build directories
archery benchmark diff /tmp/arrow-bench*/{WORKSPACE,master}/build

2. Cache Benchmark Results

Save baseline results to avoid re-running benchmarks:

# Run benchmarks once and save results
archery benchmark run --output=run-head-1.json HEAD~1

# Compare against cached results
archery benchmark diff HEAD run-head-1.json

3. Filter Benchmarks

Run only relevant benchmarks using filters:

# Filter by suite and benchmark name (supports regex)
archery benchmark diff \
  --suite-filter=compute-aggregate \
  --benchmark-filter=Kernel \
  /tmp/arrow-bench*/{WORKSPACE,master}/build

Combine all three techniques for fastest iteration during benchmark development.

Writing Benchmarks

Regression Benchmarks

Benchmarks intended for automated regression detection should follow these guidelines:

1. Name benchmarks with 'Regression' prefix

By default, the benchmark command filters for benchmarks matching ^Regression:

static void RegressionStringParsing(benchmark::State& state) {
  // benchmark implementation
}
BENCHMARK(RegressionStringParsing);

2. Don't override repetitions

The benchmark command runs with --benchmark_repetitions=K for statistical significance. Don’t override this in your benchmark definition:

// Good
BENCHMARK(RegressionStringParsing);

// Bad - don't override repetitions
BENCHMARK(RegressionStringParsing)->Repetitions(10);

3. Keep benchmarks fast

Benchmarks should run sufficiently fast. If input doesn’t fit in L2/L3 cache, the benchmark becomes memory-bound instead of CPU-bound. Consider downsizing inputs.

4. Use appropriate time metric

Google Benchmark defaults to CPU time (sum of time on CPU for all threads). For single-threaded benchmarks, this is preferable as it’s less affected by context switching.For multi-threaded benchmarks, use real time instead:

static void MultiThreadedBenchmark(benchmark::State& state) {
  // benchmark implementation
}
BENCHMARK(MultiThreadedBenchmark)->ThreadRange(1, 16)->UseRealTime();

Benchmark Structure

Here’s a template for writing Arrow C++ benchmarks:

#include <benchmark/benchmark.h>
#include <arrow/api.h>

static void BenchmarkExample(benchmark::State& state) {
  // Setup (runs once)
  auto array = arrow::ArrayFromJSON(arrow::int64(), "[1, 2, 3, 4, 5]");

  // Benchmark loop
  for (auto _ : state) {
    // Code to benchmark
    auto result = SomeOperation(array);
    benchmark::DoNotOptimize(result);
  }

  // Optional: Set custom counters
  state.SetItemsProcessed(state.iterations() * array->length());
}

BENCHMARK(BenchmarkExample);

// Benchmark with parameters
static void BenchmarkWithArgs(benchmark::State& state) {
  int64_t size = state.range(0);
  auto array = MakeArray(size);

  for (auto _ : state) {
    auto result = SomeOperation(array);
    benchmark::DoNotOptimize(result);
  }
}

BENCHMARK(BenchmarkWithArgs)->Range(1024, 1<<20);

Scripting and Automation

Archery is available as a Python library for automation:

from archery import benchmark

# Your custom benchmark automation

Controlling Output

Use --quiet to suppress build output or --output=<file> to redirect:

archery benchmark diff \
  --benchmark-filter=Kernel \
  --output=compare.json \
  ...

Continuous Benchmarking

Arrow uses Conbench for continuous benchmarking across releases. This tracks performance over time and automatically detects regressions. Benchmark results from CI are automatically uploaded to Conbench, where you can:

View historical performance trends
Compare performance across platforms
Identify when regressions were introduced
Track improvements from optimizations

Performance Best Practices

Profile Before Optimizing

Use profiling tools to identify actual bottlenecks before optimizing. Don’t guess.

Benchmark Realistic Workloads

Use data and access patterns representative of real-world usage.

Consider Memory Hierarchy

Cache effects significantly impact performance. Test with data sizes that reflect actual use cases.

Test Multiple Platforms

Performance characteristics vary across CPU architectures. Test on target platforms.

Resources

Google Benchmark Guide

Complete guide to Google Benchmark framework

Archery Documentation

Full documentation for Arrow’s development utility

Conbench

View Arrow’s continuous benchmark results

Performance Tips

C++ coding conventions and performance considerations

Contributing

Building from Source

Development

Setup

Running Benchmarks

Basic Usage

Building Benchmarks

Running Individual Benchmarks

Comparing Performance

Basic Comparison

Iterating Efficiently

1. Preserve Build Directories

2. Cache Benchmark Results

3. Filter Benchmarks

Writing Benchmarks

Regression Benchmarks

Benchmark Structure

Scripting and Automation

Controlling Output

Continuous Benchmarking

Performance Best Practices

Profile Before Optimizing

Benchmark Realistic Workloads

Consider Memory Hierarchy

Test Multiple Platforms

Resources

Google Benchmark Guide

Archery Documentation

Conbench

Performance Tips

Build docs developers (and LLMs) love

Contributing

Building from Source

Development

​Setup

​Running Benchmarks

​Basic Usage

​Building Benchmarks

​Running Individual Benchmarks

​Comparing Performance

​Basic Comparison

​Iterating Efficiently

​1. Preserve Build Directories

​2. Cache Benchmark Results

​3. Filter Benchmarks

​Writing Benchmarks

​Regression Benchmarks

​Benchmark Structure

​Scripting and Automation

​Controlling Output

​Continuous Benchmarking

​Performance Best Practices

Profile Before Optimizing

Benchmark Realistic Workloads

Consider Memory Hierarchy

Test Multiple Platforms

​Resources

Google Benchmark Guide

Archery Documentation

Conbench

Performance Tips

Build docs developers (and LLMs) love

Setup

Running Benchmarks

Basic Usage

Building Benchmarks

Running Individual Benchmarks

Comparing Performance

Basic Comparison

Iterating Efficiently

1. Preserve Build Directories

2. Cache Benchmark Results

3. Filter Benchmarks

Writing Benchmarks

Regression Benchmarks

Benchmark Structure

Scripting and Automation

Controlling Output

Continuous Benchmarking

Performance Best Practices

Resources