Skip to main content
This guide covers building PyArrow (the Python bindings for Apache Arrow) from source. PyArrow requires Arrow C++ to be built first.

System Requirements

  • Python 3.9 or later
  • C++20 compiler (gcc 9+ on Linux, Xcode on macOS)
  • CMake 3.25 or higher
  • Arrow C++ libraries (same version as PyArrow)

Environment Setup

There are two supported approaches: using conda for dependency management or using pip with manual dependencies.
1
Clone Arrow repository
2
git clone https://github.com/apache/arrow.git
cd arrow
3
Initialize submodules and set test data paths
4
git submodule update --init
export PARQUET_TEST_DATA="${PWD}/cpp/submodules/parquet-testing/data"
export ARROW_TEST_DATA="${PWD}/testing/data"
5
Create conda environment
6
Linux/macOS
conda create -y -n pyarrow-dev -c conda-forge \
    --file arrow/ci/conda_env_unix.txt \
    --file arrow/ci/conda_env_cpp.txt \
    --file arrow/ci/conda_env_python.txt \
    --file arrow/ci/conda_env_gandiva.txt \
    compilers \
    python=3.13 \
    pandas

conda activate pyarrow-dev
Windows
conda create -y -n pyarrow-dev -c conda-forge ^
    --file arrow\ci\conda_env_cpp.txt ^
    --file arrow\ci\conda_env_python.txt ^
    --file arrow\ci\conda_env_gandiva.txt ^
    python=3.13

conda activate pyarrow-dev
7
Set environment variables
8
Linux/macOS
export ARROW_HOME=$CONDA_PREFIX
Windows
set ARROW_HOME=%CONDA_PREFIX%\Library

Method 2: Using pip

If you installed Python via Anaconda or Miniconda, you must use the conda-based approach instead. pip-based virtual environments don’t work correctly with conda Python installations.
1
Clone and initialize repository
2
git clone https://github.com/apache/arrow.git
cd arrow
git submodule update --init
export PARQUET_TEST_DATA="${PWD}/cpp/submodules/parquet-testing/data"
export ARROW_TEST_DATA="${PWD}/testing/data"
3
Install system dependencies
4
Ubuntu/Debian
sudo apt-get install build-essential ninja-build cmake python3-dev
macOS
brew update && brew bundle --file=arrow/cpp/Brewfile
5
Create Python virtual environment
6
python3 -m venv pyarrow-dev
source ./pyarrow-dev/bin/activate
pip install -r arrow/python/requirements-build.txt

# Create installation directory
mkdir dist
7
Set environment variables
8
export ARROW_HOME=$(pwd)/dist
export LD_LIBRARY_PATH=$(pwd)/dist/lib:$LD_LIBRARY_PATH
export CMAKE_PREFIX_PATH=$ARROW_HOME:$CMAKE_PREFIX_PATH

Building Arrow C++

PyArrow requires Arrow C++ libraries to be built first.
1
Configure Arrow C++ build
2
Using Presets (Recommended)
cmake -S arrow/cpp -B arrow/cpp/build \
    -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
    --preset ninja-release-python
Manual Configuration
cmake -S arrow/cpp -B arrow/cpp/build \
    -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
    -DCMAKE_BUILD_TYPE=Release \
    -DARROW_BUILD_TESTS=OFF \
    -DARROW_COMPUTE=ON \
    -DARROW_CSV=ON \
    -DARROW_DATASET=ON \
    -DARROW_FILESYSTEM=ON \
    -DARROW_HDFS=ON \
    -DARROW_JSON=ON \
    -DARROW_PARQUET=ON \
    -DARROW_WITH_BROTLI=ON \
    -DARROW_WITH_BZ2=ON \
    -DARROW_WITH_LZ4=ON \
    -DARROW_WITH_SNAPPY=ON \
    -DARROW_WITH_ZLIB=ON \
    -DARROW_WITH_ZSTD=ON \
    -DPARQUET_REQUIRE_ENCRYPTION=ON
Windows
mkdir arrow\cpp\build
pushd arrow\cpp\build

cmake -G "Ninja" ^
    -DCMAKE_INSTALL_PREFIX=%ARROW_HOME% ^
    -DCMAKE_UNITY_BUILD=ON ^
    -DARROW_COMPUTE=ON ^
    -DARROW_CSV=ON ^
    -DARROW_CXXFLAGS="/WX /MP" ^
    -DARROW_DATASET=ON ^
    -DARROW_FILESYSTEM=ON ^
    -DARROW_HDFS=ON ^
    -DARROW_JSON=ON ^
    -DARROW_PARQUET=ON ^
    -DARROW_WITH_LZ4=ON ^
    -DARROW_WITH_SNAPPY=ON ^
    -DARROW_WITH_ZLIB=ON ^
    -DARROW_WITH_ZSTD=ON ^
    ..

popd
3
Available presets:
  • ninja-release-python - Standard release build for PyArrow
  • ninja-release-python-maximal - All features including CUDA, Flight, Gandiva
  • ninja-release-python-minimal - Minimal features (no ORC, dataset)
  • ninja-debug-python - Debug build with symbols
4
Build and install Arrow C++
5
Linux/macOS
cmake --build arrow/cpp/build --target install
Windows
cmake --build arrow\cpp\build --target install --config Release
If you encounter CMake errors, ensure:
  • Arrow C++ version matches the PyArrow version you’re building
  • ARROW_HOME is set correctly
  • For conda environments, use ARROW_DEPENDENCY_SOURCE=CONDA
  • For pip, you may need ARROW_DEPENDENCY_SOURCE=BUNDLED or SYSTEM

Building PyArrow

2
cd arrow/python
3
Build PyArrow in-place
4
Development Build
python setup.py build_ext --inplace
Debug Build
export PYARROW_BUILD_TYPE=debug
python setup.py build_ext --inplace
Parallel Build (Faster)
export PYARROW_PARALLEL=8  # Number of CPU cores
python setup.py build_ext --inplace
5
Install as editable package (optional)
6
pip install -e . --no-build-isolation

Build Configuration

PyArrow automatically detects which Arrow C++ components were built. Override with environment variables:
export PYARROW_WITH_PARQUET=1
export PYARROW_WITH_DATASET=1
export PYARROW_WITH_FLIGHT=1
export PYARROW_WITH_GANDIVA=1
export PYARROW_WITH_S3=1
export PYARROW_WITH_GCS=1
export PYARROW_WITH_HDFS=1
export PYARROW_WITH_CUDA=1

python setup.py build_ext --inplace

Platform-Specific Notes

Windows: Bundling Arrow C++ DLLs

On Windows without conda, Arrow C++ DLLs must be bundled or in PATH:
# Option 1: Bundle DLLs with PyArrow
set PYARROW_BUNDLE_ARROW_CPP=1
python setup.py build_ext --inplace

# Option 2: Add to PATH before importing
set PATH=%ARROW_HOME%\bin;%PATH%
python -c "import pyarrow"
Bundled Arrow C++ libraries won’t auto-update when rebuilding Arrow C++. You must rebuild PyArrow after C++ changes.

Linux: Library Path Issues

On some systems, libraries may install to lib64:
cmake -DCMAKE_INSTALL_LIBDIR=lib ...

Python Version Selection

If multiple Python versions exist:
cmake -DPython3_EXECUTABLE=/path/to/bin/python ...

Building Distribution Wheels

To create a redistributable wheel with bundled libraries:
pip install wheel
python setup.py build_ext --build-type=release \
    --bundle-arrow-cpp bdist_wheel
The wheel will be in dist/pyarrow-*.whl.

Cleaning Stale Build Artifacts

Clean when you see errors like:
  • Unknown CMake command "arrow_keep_backward_compatibility"
  • Linking errors after Arrow C++ updates
  • Unexpected import failures
rm -rf arrow/cpp/build
git clean -Xfd python
conda deactivate
conda remove -n pyarrow-dev --all
# Then recreate from scratch

Environment Variables Reference

VariableDescriptionDefault
PYARROW_BUILD_TYPEBuild type (release, debug, relwithdebinfo)release
PYARROW_PARALLELNumber of parallel compilation jobsauto
PYARROW_BUNDLE_ARROW_CPPBundle Arrow C++ libraries0
PYARROW_CMAKE_GENERATORCMake generator (e.g., Ninja, Visual Studio)system default
PYARROW_CMAKE_OPTIONSAdditional CMake options''
PYARROW_CXXFLAGSExtra C++ compiler flags''
PYARROW_WITH_PARQUETEnable Parquet supportauto-detect
PYARROW_WITH_DATASETEnable Dataset APIauto-detect
PYARROW_WITH_FLIGHTEnable Flight RPCauto-detect
PYARROW_WITH_GANDIVAEnable Gandivaauto-detect
PYARROW_WITH_S3Enable S3 filesystemauto-detect
PYARROW_WITH_GCSEnable Google Cloud Storageauto-detect
PYARROW_WITH_HDFSEnable HDFSauto-detect
PYARROW_WITH_CUDAEnable CUDA integrationauto-detect

Running Tests

After building:
# Install test dependencies
pip install -r arrow/python/requirements-test.txt

# Run tests
pytest arrow/python/pyarrow

# Run specific test
pytest arrow/python/pyarrow/tests/test_array.py

Common Build Errors

Error: Could not find ArrowSolution: Ensure ARROW_HOME is set and Arrow C++ is installed:
export ARROW_HOME=/path/to/arrow/install
export CMAKE_PREFIX_PATH=$ARROW_HOME:$CMAKE_PREFIX_PATH
Error: Arrow C++ version X.Y.Z does not match PyArrow version A.B.CSolution: Ensure Arrow C++ and PyArrow versions match. Rebuild Arrow C++ if needed.
Error: DLL load failedSolution: Either bundle DLLs or add to PATH:
set PYARROW_BUNDLE_ARROW_CPP=1
# OR
set PATH=%ARROW_HOME%\bin;%PATH%
Error: Build failures with conda installedSolution: Explicitly set dependency source:
cmake -DARROW_DEPENDENCY_SOURCE=CONDA ...

Next Steps

C++ Build Guide

Learn more about Arrow C++ build options

Development Workflow

Contributing to PyArrow

Testing Guide

Running and writing tests

PyArrow Documentation

Python API reference

Build docs developers (and LLMs) love