System Requirements
- Linux/macOS
- Windows
- Python 3.9 or later
- C++20 compiler (gcc 9+ on Linux, Xcode on macOS)
- CMake 3.25 or higher
- Arrow C++ libraries (same version as PyArrow)
Environment Setup
There are two supported approaches: using conda for dependency management or using pip with manual dependencies.Method 1: Using Conda (Recommended)
git submodule update --init
export PARQUET_TEST_DATA="${PWD}/cpp/submodules/parquet-testing/data"
export ARROW_TEST_DATA="${PWD}/testing/data"
Method 2: Using pip
git clone https://github.com/apache/arrow.git
cd arrow
git submodule update --init
export PARQUET_TEST_DATA="${PWD}/cpp/submodules/parquet-testing/data"
export ARROW_TEST_DATA="${PWD}/testing/data"
python3 -m venv pyarrow-dev
source ./pyarrow-dev/bin/activate
pip install -r arrow/python/requirements-build.txt
# Create installation directory
mkdir dist
Building Arrow C++
PyArrow requires Arrow C++ libraries to be built first.Available presets:
ninja-release-python- Standard release build for PyArrowninja-release-python-maximal- All features including CUDA, Flight, Gandivaninja-release-python-minimal- Minimal features (no ORC, dataset)ninja-debug-python- Debug build with symbols
Building PyArrow
Build Configuration
PyArrow automatically detects which Arrow C++ components were built. Override with environment variables:Platform-Specific Notes
Windows: Bundling Arrow C++ DLLs
On Windows without conda, Arrow C++ DLLs must be bundled or in PATH:Linux: Library Path Issues
On some systems, libraries may install tolib64:
Python Version Selection
If multiple Python versions exist:Building Distribution Wheels
To create a redistributable wheel with bundled libraries:dist/pyarrow-*.whl.
Cleaning Stale Build Artifacts
When to clean build artifacts
When to clean build artifacts
Clean when you see errors like:
Unknown CMake command "arrow_keep_backward_compatibility"- Linking errors after Arrow C++ updates
- Unexpected import failures
Clean Arrow C++ build
Clean Arrow C++ build
Clean PyArrow build artifacts
Clean PyArrow build artifacts
Recreate conda environment
Recreate conda environment
Environment Variables Reference
| Variable | Description | Default |
|---|---|---|
PYARROW_BUILD_TYPE | Build type (release, debug, relwithdebinfo) | release |
PYARROW_PARALLEL | Number of parallel compilation jobs | auto |
PYARROW_BUNDLE_ARROW_CPP | Bundle Arrow C++ libraries | 0 |
PYARROW_CMAKE_GENERATOR | CMake generator (e.g., Ninja, Visual Studio) | system default |
PYARROW_CMAKE_OPTIONS | Additional CMake options | '' |
PYARROW_CXXFLAGS | Extra C++ compiler flags | '' |
PYARROW_WITH_PARQUET | Enable Parquet support | auto-detect |
PYARROW_WITH_DATASET | Enable Dataset API | auto-detect |
PYARROW_WITH_FLIGHT | Enable Flight RPC | auto-detect |
PYARROW_WITH_GANDIVA | Enable Gandiva | auto-detect |
PYARROW_WITH_S3 | Enable S3 filesystem | auto-detect |
PYARROW_WITH_GCS | Enable Google Cloud Storage | auto-detect |
PYARROW_WITH_HDFS | Enable HDFS | auto-detect |
PYARROW_WITH_CUDA | Enable CUDA integration | auto-detect |
Running Tests
After building:Common Build Errors
CMake cannot find Arrow
CMake cannot find Arrow
Error:
Could not find ArrowSolution: Ensure ARROW_HOME is set and Arrow C++ is installed:Version mismatch
Version mismatch
Error:
Arrow C++ version X.Y.Z does not match PyArrow version A.B.CSolution: Ensure Arrow C++ and PyArrow versions match. Rebuild Arrow C++ if needed.Import errors on Windows
Import errors on Windows
Error:
DLL load failedSolution: Either bundle DLLs or add to PATH:Conda conflicts
Conda conflicts
Error: Build failures with conda installedSolution: Explicitly set dependency source:
Next Steps
C++ Build Guide
Learn more about Arrow C++ build options
Development Workflow
Contributing to PyArrow
Testing Guide
Running and writing tests
PyArrow Documentation
Python API reference