Skip to main content
Apache Arrow uses a comprehensive CI/CD system to test and build across multiple languages, platforms, compilers, and dependency combinations. This guide explains how the system works and how to interact with it.

Overview

Arrow’s CI infrastructure ensures code quality across:
  • Languages: C++, Python, R, Java, JavaScript, Go, Rust, C#, Ruby, MATLAB
  • Operating Systems: Linux (multiple distros), macOS, Windows
  • Compilers: GCC, Clang, MSVC, various versions
  • Package Managers: conda, pip, Homebrew, apt, yum, vcpkg
  • Architectures: x86_64, ARM64, s390x (big-endian)

Architecture Components

The CI system consists of three main components:

1. GitHub Actions (Action-Triggered Builds)

Workflows in .github/workflows/ run automatically on pull requests and merges:
.github/workflows/
├── cpp.yml              # C++ builds and tests
├── python.yml           # Python builds and tests  
├── r.yml                # R package checks
├── dev.yml              # Linting, PR merge checks
├── dev_pr.yml           # PR title formatting, labels
├── comment_bot.yml      # Bot commands (@github-actions)
└── ...

2. Docker (Portable Builds)

Docker provides reproducible build environments:
  • compose.yaml: Defines services for different build configurations
  • .env: Default environment variable values
  • ci/docker/: Dockerfiles for each build environment
Example services:
  • ubuntu-cpp: Ubuntu-based C++ development
  • conda-python: Python builds with conda
  • fedora-cpp: Fedora-based builds

3. Crossbow (Extended/Nightly Builds)

Crossbow coordinates complex, long-running builds:
  • dev/tasks/: Build task definitions
  • dev/tasks/tasks.yml: Task configuration and groups
  • Triggered manually or on schedule

Key Directories

DirectoryPurpose
.github/workflows/GitHub Actions workflow definitions
ci/docker/Dockerfiles for build environments
ci/scripts/Build and test scripts
dev/tasks/Crossbow task templates
dev/archery/Archery CLI tool source

GitHub Actions Workflows

Language-Specific Workflows

Each major language has dedicated workflows:
name: C++
on:
  pull_request:
    paths:
      - 'cpp/**'
      - '.github/workflows/cpp.yml'
      # ...
jobs:
  build:
    # Matrix of OS, compiler, build type

Bot Workflows

comment_bot.yml

Responds to special comments on pull requests:
# Trigger Crossbow builds
@github-actions crossbow submit <task-name>

# Example: Run all C++ tests
@github-actions crossbow submit test-cpp-linux

# Example: Rebase PR
@github-actions rebase

dev_pr.yml

Automatically:
  • Validates PR title format (GH-12345: [Component] Description)
  • Links PR to GitHub issue
  • Adds relevant labels
  • Assigns issue to PR author

Using Docker for Local Development

Building with Docker Compose

Build Arrow in a reproducible environment:
# Build Ubuntu C++ environment
docker compose build ubuntu-cpp

# Run build
docker compose run ubuntu-cpp

# Build specific components
docker compose run ubuntu-cpp bash -c "cmake ... && make"
Archery simplifies Docker interactions:
# Install Archery
pip install -e dev/archery

# Run Docker service (automatically builds dependencies)
archery docker run ubuntu-cpp

# Execute commands in container
archery docker run ubuntu-cpp bash

# Run tests
archery docker run ubuntu-cpp make unittest
Archery automatically handles dependency resolution between Docker services, building prerequisite images as needed.

Crossbow System

Crossbow manages extended builds that are too resource-intensive for PR checks.

Task Structure

dev/tasks/
├── tasks.yml           # Task definitions and groups
├── linux-packages/     # Linux package builds
├── conda-recipes/      # Conda package recipes
├── python-wheels/      # PyPI wheel builds
├── r/                  # CRAN package builds
└── ...

Submitting Crossbow Tasks

2
Comment on a PR:
3
@github-actions crossbow submit --group cpp
4
Via Archery CLI
5
From your local machine:
6
# Submit single task
archery crossbow submit test-cpp-linux

# Submit task group
archery crossbow submit --group python

# List available tasks
archery crossbow task list

# Check task status
archery crossbow status <run-id>

Common Task Groups

GroupDescription
cppAll C++ builds across platforms
pythonPython wheel builds
rR package builds
wheel-linuxLinux Python wheels
wheel-macosmacOS Python wheels
wheel-windowsWindows Python wheels
condaConda package builds
integrationCross-language integration tests

Environment Configuration

compose.yaml and .env

Docker services are configured through environment variables:
compose.yaml
services:
  ubuntu-cpp:
    build:
      context: .
      dockerfile: ci/docker/ubuntu-${UBUNTU}.dockerfile
      args:
        arch: ${ARCH:-amd64}
        llvm: ${LLVM:-15}
.env
UBUNTU=22.04
ARCH=amd64
LLVM=15
GO=1.21.0
JDK=11
Override defaults:
# Use different Ubuntu version
UBUNTU=24.04 docker compose run ubuntu-cpp

# Use ARM architecture
ARCH=arm64v8 docker compose run ubuntu-cpp

CI Scripts

Reusable scripts in ci/scripts/ handle common tasks:
ScriptPurpose
cpp_build.shBuild Arrow C++
python_build.shBuild PyArrow
r_build.shBuild R package
java_build.shBuild Java components
install_conda.shSet up conda environment
util_*.shUtility functions
Scripts accept environment variables for configuration:
# C++ build with specific options
export ARROW_BUILD_TESTS=ON
export ARROW_COMPUTE=ON
export ARROW_FLIGHT=ON
ci/scripts/cpp_build.sh $(pwd) $(pwd)/build

Nightly Builds

Nightly builds run automatically via Crossbow:
  • Schedule: Daily at 00:00 UTC
  • Tasks: Comprehensive builds for all languages/platforms
  • Artifacts: Published to Apache artifactory
  • Purpose: Catch integration issues early
View results: https://github.com/ursacomputing/crossbow/actions

Common CI Tasks

Running Specific Tests

# Run C++ tests locally with Docker
archery docker run ubuntu-cpp

# Inside container
cmake --build build --target unittest

# Or trigger on PR
@github-actions crossbow submit test-cpp-linux

Building Release Artifacts

# Python wheels for all platforms
@github-actions crossbow submit --group wheel

# Linux packages
@github-actions crossbow submit --group linux-packages

# Conda packages
@github-actions crossbow submit --group conda

# R CRAN binaries
@github-actions crossbow submit --group r

Debugging CI Failures

1
Identify the failing workflow
2
Check the PR’s “Checks” tab for failures.
3
Reproduce locally with Docker
4
# Use the same Docker service
archery docker run <service-name>
5
Check logs
6
# View GitHub Actions logs
# Click on failed job → View details

# For Crossbow jobs
archery crossbow status <run-id>
7
Run the exact failing command
8
Copy from logs and run in Docker container.

CI Best Practices

  • Only trigger relevant workflows (via path filters)
  • Use matrix builds to parallelize
  • Cache dependencies when possible
  • Use Crossbow for expensive builds, not PR checks
  • Mark flaky tests with @pytest.mark.flaky
  • Use ARROW_ENABLE_TIMING_TESTS=OFF to disable timing-dependent tests
  • Report persistent flakes on GitHub
  • Regularly update base images
  • Pin dependency versions for reproducibility
  • Test locally before pushing changes
  • Update this guide for workflow changes
  • Add comments in workflow files
  • Announce major changes on mailing list

Troubleshooting

Error: Failed to download packageSolution: Docker may have network issues. Try:
docker compose build --no-cache <service>
Issue: Task never startsSolution:
  • Check GitHub Actions quota
  • Verify task name is correct
  • Check Crossbow repo for errors
Common causes:
  • Environment differences (locale, timezone)
  • Missing test dependencies
  • Race conditions (use pytest-xdist carefully)
  • Disk space issues on CI runners
Error: Permission deniedSolution: Only committers can push to Apache registries. For PRs, images build locally or in CI.

Resources

Archery Documentation

Learn more about the Archery CLI tool

Crossbow Guide

Deep dive into Crossbow task system

Docker Guide

Detailed Docker usage for development

GitHub Actions Docs

GitHub Actions reference

Next Steps

Build docs developers (and LLMs) love