Skip to main content
This guide covers testing practices for Apache Arrow, including how to run tests, write new tests, and follow best practices for each language implementation.

Running Tests

Apache Arrow’s Python implementation uses pytest for unit testing.

Test Structure

Tests in PyArrow follow the pytest convention for “Tests as part of application code”:
pyarrow/
    __init__.py
    csv.py
    dataset.py
    ...
    tests/
        __init__.py
        test_csv.py
        test_dataset.py
        ...
Tests for Parquet are located in a separate folder: pyarrow/tests/parquet/.

Running PyArrow Tests

1

Run a specific test

From the arrow/python directory:
pytest pyarrow/tests/test_file.py -k test_your_unit_test
2

Run all tests in a file

pytest pyarrow/tests/test_file.py
3

Run all tests

pytest pyarrow
You can also run tests with python -m pytest [...] which adds the current directory to sys.path and can help if pytest [...] results in an ImportError.

Test Groups

Many tests are grouped using pytest marks. Some groups are disabled by default:
  • Enable a group: --$GROUP_NAME (e.g., --parquet)
  • Disable a group: --disable-$GROUP_NAME (e.g., --disable-parquet)
  • Run only a group: --only-$GROUP_NAME (e.g., --only-parquet)
Available test groups:
GroupDescription
datasetApache Arrow Dataset tests
flightFlight RPC tests
gandivaGandiva expression compiler tests (uses LLVM)
hdfsTests using libhdfs for Hadoop filesystem
hypothesisTests using hypothesis for random test cases (use --enable-hypothesis)
large_memoryTests requiring large amounts of system RAM
orcApache ORC tests
parquetApache Parquet tests

Troubleshooting

If tests start failing, try recompiling PyArrow or Arrow C++:
# Rebuild from source
cd arrow/python
python setup.py build_ext --inplace

Test Fixtures

PyArrow test files contain helper functions and fixtures. Common examples:
  • _alltypes_example in test_pandas: Supplies a dataframe with 100 rows for all data types
  • _check_pandas_roundtrip in test_pandas: Asserts roundtrip conversion from Pandas through Arrow structures
  • large_buffer fixture: Supplies a PyArrow buffer of fixed size
Look through test files before adding tests to see if existing fixtures can help.

Best Practices

When to Add Tests

In general, any change to source code needs accompanying unit tests:
  • Add functionality → Add unit tests
  • Modify functionality → Update unit tests
  • Solve a bug → Add unit test before fixing (helps prove the bug and its fix)
  • Performance improvements → Reflect in benchmarks (which are also tests)
  • Refactoring → May not need test changes if fully covered by existing tests
Rule of thumb: If the new functionality is a user-facing or API change, you will almost certainly need to change tests. If no tests need changing, it might mean the tests aren’t right!

Writing Quality Tests

Each test should verify a single behavior or feature. Avoid overloading tests with multiple assertions for unrelated functionality.
Test names should clearly describe what they’re testing:
# Good
def test_timestamp_with_timezone_prints_correctly():
    ...

# Bad
def test_timestamp():
    ...
Tests should have as few external dependencies as possible. If testing file reading, provide the smallest possible example file or code to create one.
Tests should produce consistent results across different environments and runs. Avoid depending on timing, network conditions, or external state.

Continuous Integration

All tests run automatically in CI pipelines when you submit a pull request. The CI system tests:
  • Multiple platforms (Linux, macOS, Windows)
  • Different compiler versions
  • Various build configurations
  • Address Sanitizer (ASan) and Undefined Behavior Sanitizer (UBSan)
Your PR must pass all CI checks before it can be merged.

Running CI Checks Locally

Before submitting a PR, you can run some CI checks locally:
# Run C++ linting and style checks
pre-commit run --show-diff-on-failure --color=always --all-files cpp

# Run Python linting and style checks
pre-commit run --show-diff-on-failure --color=always --all-files python

Resources

pytest Documentation

Complete guide to pytest framework

testthat Documentation

R package testing with testthat

Google Test Primer

Introduction to Google Test framework

Arrow CI Overview

Learn about Arrow’s CI infrastructure

Build docs developers (and LLMs) love