Overview
Arrow’s CI infrastructure ensures code quality across:- Languages: C++, Python, R, Java, JavaScript, Go, Rust, C#, Ruby, MATLAB
- Operating Systems: Linux (multiple distros), macOS, Windows
- Compilers: GCC, Clang, MSVC, various versions
- Package Managers: conda, pip, Homebrew, apt, yum, vcpkg
- Architectures: x86_64, ARM64, s390x (big-endian)
Architecture Components
The CI system consists of three main components:1. GitHub Actions (Action-Triggered Builds)
Workflows in.github/workflows/ run automatically on pull requests and merges:
2. Docker (Portable Builds)
Docker provides reproducible build environments:- compose.yaml: Defines services for different build configurations
- .env: Default environment variable values
- ci/docker/: Dockerfiles for each build environment
ubuntu-cpp: Ubuntu-based C++ developmentconda-python: Python builds with condafedora-cpp: Fedora-based builds
3. Crossbow (Extended/Nightly Builds)
Crossbow coordinates complex, long-running builds:- dev/tasks/: Build task definitions
- dev/tasks/tasks.yml: Task configuration and groups
- Triggered manually or on schedule
Key Directories
| Directory | Purpose |
|---|---|
.github/workflows/ | GitHub Actions workflow definitions |
ci/docker/ | Dockerfiles for build environments |
ci/scripts/ | Build and test scripts |
dev/tasks/ | Crossbow task templates |
dev/archery/ | Archery CLI tool source |
GitHub Actions Workflows
Language-Specific Workflows
Each major language has dedicated workflows:Bot Workflows
comment_bot.yml
Responds to special comments on pull requests:dev_pr.yml
Automatically:- Validates PR title format (
GH-12345: [Component] Description) - Links PR to GitHub issue
- Adds relevant labels
- Assigns issue to PR author
Using Docker for Local Development
Building with Docker Compose
Build Arrow in a reproducible environment:Using Archery (Recommended)
Archery simplifies Docker interactions:Crossbow System
Crossbow manages extended builds that are too resource-intensive for PR checks.Task Structure
Submitting Crossbow Tasks
Common Task Groups
| Group | Description |
|---|---|
cpp | All C++ builds across platforms |
python | Python wheel builds |
r | R package builds |
wheel-linux | Linux Python wheels |
wheel-macos | macOS Python wheels |
wheel-windows | Windows Python wheels |
conda | Conda package builds |
integration | Cross-language integration tests |
Environment Configuration
compose.yaml and .env
Docker services are configured through environment variables:compose.yaml
.env
CI Scripts
Reusable scripts inci/scripts/ handle common tasks:
| Script | Purpose |
|---|---|
cpp_build.sh | Build Arrow C++ |
python_build.sh | Build PyArrow |
r_build.sh | Build R package |
java_build.sh | Build Java components |
install_conda.sh | Set up conda environment |
util_*.sh | Utility functions |
Nightly Builds
Nightly builds run automatically via Crossbow:- Schedule: Daily at 00:00 UTC
- Tasks: Comprehensive builds for all languages/platforms
- Artifacts: Published to Apache artifactory
- Purpose: Catch integration issues early
Common CI Tasks
Running Specific Tests
- C++
- Python
- R
Building Release Artifacts
Debugging CI Failures
# View GitHub Actions logs
# Click on failed job → View details
# For Crossbow jobs
archery crossbow status <run-id>
CI Best Practices
Minimize CI runtime
Minimize CI runtime
- Only trigger relevant workflows (via path filters)
- Use matrix builds to parallelize
- Cache dependencies when possible
- Use Crossbow for expensive builds, not PR checks
Handle flaky tests
Handle flaky tests
- Mark flaky tests with
@pytest.mark.flaky - Use
ARROW_ENABLE_TIMING_TESTS=OFFto disable timing-dependent tests - Report persistent flakes on GitHub
Keep Docker images up-to-date
Keep Docker images up-to-date
- Regularly update base images
- Pin dependency versions for reproducibility
- Test locally before pushing changes
Document new CI changes
Document new CI changes
- Update this guide for workflow changes
- Add comments in workflow files
- Announce major changes on mailing list
Troubleshooting
Docker build fails with network errors
Docker build fails with network errors
Error:
Failed to download packageSolution: Docker may have network issues. Try:Crossbow task stuck in pending
Crossbow task stuck in pending
Issue: Task never startsSolution:
- Check GitHub Actions quota
- Verify task name is correct
- Check Crossbow repo for errors
Tests pass locally but fail in CI
Tests pass locally but fail in CI
Common causes:
- Environment differences (locale, timezone)
- Missing test dependencies
- Race conditions (use
pytest-xdistcarefully) - Disk space issues on CI runners
Can't push Docker image
Can't push Docker image
Error: Permission deniedSolution: Only committers can push to Apache registries. For PRs, images build locally or in CI.
Resources
Archery Documentation
Learn more about the Archery CLI tool
Crossbow Guide
Deep dive into Crossbow task system
Docker Guide
Detailed Docker usage for development
GitHub Actions Docs
GitHub Actions reference
Next Steps
- Read the contributing guide for PR workflow
- Check building C++ for local development setup
- Join the dev mailing list for CI discussions