Skip to main content
Apache Arrow includes comprehensive benchmarking infrastructure to measure and track performance across releases. This guide covers how to run benchmarks, compare results, and write effective benchmarks.

Setup

Before running benchmarks, install the Archery utility:
pip install -e dev/archery[benchmark]
Archery is Arrow’s command-line tool for development tasks including running benchmarks.

Running Benchmarks

The recommended way to run benchmarks is through Archery’s benchmark run command.

Basic Usage

1

Run benchmarks in current workspace

archery benchmark run
2

Save results to a file

archery benchmark run --output=run.json
This saves results for later comparison.
3

Run with custom CMake flags

export CC=clang-8 CXX=clang++8
archery benchmark run --cmake-extras="-DARROW_SIMD_LEVEL=NONE"
4

Use existing build directory

archery benchmark run $HOME/arrow/cpp/release-build
For meaningful benchmark results, always build in Release mode to enable compiler optimizations. Debug builds will produce misleading performance numbers.

Comparing Performance

One of the primary goals of benchmarking is detecting performance regressions. Archery provides a benchmark diff command for comparing results.

Basic Comparison

1

Compare workspace with main branch

By default, compares your current changes against the local main branch:
archery --quiet benchmark diff --benchmark-filter=FloatParsing
Example output:
-----------------------------------------------------------------------------------
Non-regressions: (1)
-----------------------------------------------------------------------------------
               benchmark            baseline           contender  change % counters
 FloatParsing<FloatType>  105.983M items/sec  105.983M items/sec       0.0       {}

------------------------------------------------------------------------------------
Regressions: (1)
------------------------------------------------------------------------------------
                benchmark            baseline           contender  change % counters
 FloatParsing<DoubleType>  209.941M items/sec  109.941M items/sec   -47.632       {}
2

Compare specific commits or branches

archery benchmark diff main feature-branch
3

Compare using saved results

archery benchmark run --output=baseline.json $HOME/arrow/cpp/release-build
git checkout some-feature
archery benchmark run --output=contender.json $HOME/arrow/cpp/release-build
archery benchmark diff contender.json baseline.json
For more options, run:
archery benchmark diff --help

Iterating Efficiently

Benchmark development can be tedious due to long build and run times. Use these techniques to reduce overhead:

1. Preserve Build Directories

Reuse build directories to avoid rebuilding from scratch:
# First invocation: clone and build in temporary directory
archery benchmark diff --preserve

# Modify C++ sources...

# Re-run benchmark using existing build directories
archery benchmark diff /tmp/arrow-bench*/{WORKSPACE,master}/build

2. Cache Benchmark Results

Save baseline results to avoid re-running benchmarks:
# Run benchmarks once and save results
archery benchmark run --output=run-head-1.json HEAD~1

# Compare against cached results
archery benchmark diff HEAD run-head-1.json

3. Filter Benchmarks

Run only relevant benchmarks using filters:
# Filter by suite and benchmark name (supports regex)
archery benchmark diff \
  --suite-filter=compute-aggregate \
  --benchmark-filter=Kernel \
  /tmp/arrow-bench*/{WORKSPACE,master}/build
Combine all three techniques for fastest iteration during benchmark development.

Writing Benchmarks

Regression Benchmarks

Benchmarks intended for automated regression detection should follow these guidelines:
By default, the benchmark command filters for benchmarks matching ^Regression:
static void RegressionStringParsing(benchmark::State& state) {
  // benchmark implementation
}
BENCHMARK(RegressionStringParsing);
The benchmark command runs with --benchmark_repetitions=K for statistical significance. Don’t override this in your benchmark definition:
// Good
BENCHMARK(RegressionStringParsing);

// Bad - don't override repetitions
BENCHMARK(RegressionStringParsing)->Repetitions(10);
Benchmarks should run sufficiently fast. If input doesn’t fit in L2/L3 cache, the benchmark becomes memory-bound instead of CPU-bound. Consider downsizing inputs.
Google Benchmark defaults to CPU time (sum of time on CPU for all threads). For single-threaded benchmarks, this is preferable as it’s less affected by context switching.For multi-threaded benchmarks, use real time instead:
static void MultiThreadedBenchmark(benchmark::State& state) {
  // benchmark implementation
}
BENCHMARK(MultiThreadedBenchmark)->ThreadRange(1, 16)->UseRealTime();

Benchmark Structure

Here’s a template for writing Arrow C++ benchmarks:
#include <benchmark/benchmark.h>
#include <arrow/api.h>

static void BenchmarkExample(benchmark::State& state) {
  // Setup (runs once)
  auto array = arrow::ArrayFromJSON(arrow::int64(), "[1, 2, 3, 4, 5]");

  // Benchmark loop
  for (auto _ : state) {
    // Code to benchmark
    auto result = SomeOperation(array);
    benchmark::DoNotOptimize(result);
  }

  // Optional: Set custom counters
  state.SetItemsProcessed(state.iterations() * array->length());
}

BENCHMARK(BenchmarkExample);

// Benchmark with parameters
static void BenchmarkWithArgs(benchmark::State& state) {
  int64_t size = state.range(0);
  auto array = MakeArray(size);

  for (auto _ : state) {
    auto result = SomeOperation(array);
    benchmark::DoNotOptimize(result);
  }
}

BENCHMARK(BenchmarkWithArgs)->Range(1024, 1<<20);

Scripting and Automation

Archery is available as a Python library for automation:
from archery import benchmark

# Your custom benchmark automation

Controlling Output

Use --quiet to suppress build output or --output=<file> to redirect:
archery benchmark diff \
  --benchmark-filter=Kernel \
  --output=compare.json \
  ...

Continuous Benchmarking

Arrow uses Conbench for continuous benchmarking across releases. This tracks performance over time and automatically detects regressions. Benchmark results from CI are automatically uploaded to Conbench, where you can:
  • View historical performance trends
  • Compare performance across platforms
  • Identify when regressions were introduced
  • Track improvements from optimizations

Performance Best Practices

Profile Before Optimizing

Use profiling tools to identify actual bottlenecks before optimizing. Don’t guess.

Benchmark Realistic Workloads

Use data and access patterns representative of real-world usage.

Consider Memory Hierarchy

Cache effects significantly impact performance. Test with data sizes that reflect actual use cases.

Test Multiple Platforms

Performance characteristics vary across CPU architectures. Test on target platforms.

Resources

Google Benchmark Guide

Complete guide to Google Benchmark framework

Archery Documentation

Full documentation for Arrow’s development utility

Conbench

View Arrow’s continuous benchmark results

Performance Tips

C++ coding conventions and performance considerations

Build docs developers (and LLMs) love