Skip to main content
Red Arrow provides the official Ruby bindings for Apache Arrow, enabling Ruby applications to work with Arrow’s high-performance columnar data structures and integrate with the broader Arrow ecosystem.

Quick Install

Install Red Arrow using RubyGems:
gem install red-arrow
For Parquet file support, also install:
gem install red-parquet

Installation Options

Basic Installation

Install the core Red Arrow library:
1

Install Red Arrow

gem install red-arrow
This installs the base Apache Arrow bindings for Ruby.
2

Verify installation

require 'arrow'
puts Arrow::VERSION

Installing Additional Modules

Red Arrow is split into several gems for different functionality:
# Install core Arrow and Parquet support
gem install red-arrow
gem install red-parquet

Using Bundler

For Ruby projects using Bundler, add to your Gemfile:
Gemfile
source 'https://rubygems.org'

# Core Arrow support
gem 'red-arrow'

# Optional: Parquet support
gem 'red-parquet'

# Optional: Dataset API for S3 and multi-file support
gem 'red-arrow-dataset'
Then install:
bundle install

System Requirements

Red Arrow requires:
  • Ruby: 2.5 or later (3.0+ recommended)
  • Arrow C++ library: Automatically installed via binary gems on:
    • Linux (x86_64, aarch64)
    • macOS (Intel and Apple Silicon)
    • Windows (x86_64)
Binary gems include pre-compiled Arrow C++ libraries, so you typically don’t need to install Arrow C++ separately.

Verifying Installation

Test your Red Arrow installation:
1

Create a simple test file

Create a file named test_arrow.rb:
test_arrow.rb
require 'arrow'

# Create a simple table
table = Arrow::Table.new(
  'name' => ['Alice', 'Bob', 'Charlie'],
  'age' => [25, 30, 35]
)

puts "Arrow version: #{Arrow::VERSION}"
puts "Table shape: #{table.n_rows} rows, #{table.n_columns} columns"
puts table.to_s
2

Run the test

ruby test_arrow.rb
You should see output showing the Arrow version and table contents.
3

Test Parquet support (if installed)

test_parquet.rb
require 'arrow'
require 'parquet'

# Create and save a table
table = Arrow::Table.new(
  'name' => ['Alice', 'Bob'],
  'age' => [25, 30]
)

table.save('test.parquet', format: :parquet)
puts "Saved to test.parquet"

# Read it back
table2 = Arrow::Table.load('test.parquet', format: :parquet)
puts "Loaded #{table2.n_rows} rows"

Common Use Cases

Creating Tables from Ruby Data

require 'arrow'

# From a hash (types detected automatically)
table = Arrow::Table.new(
  'name' => ['Tom', 'Max', 'Kate'],
  'age' => [22, 23, 19],
  'salary' => [50000.0, 60000.0, 55000.0]
)

puts table.to_s

Reading and Writing Files

require 'arrow'

# Write to Arrow IPC format
table = Arrow::Table.new('x' => [1, 2, 3], 'y' => [4, 5, 6])
table.save('data.arrow')

# Read from Arrow IPC format
table2 = Arrow::Table.load('data.arrow')

Loading from HTTP/Remote Sources

require 'arrow'
require 'net/http'

# Example: Loading Arrow data from a remote source
params = {
  query: "SELECT id, name FROM users LIMIT 10 FORMAT Arrow",
  user: "demo",
  password: "",
  database: "default"
}

uri = URI('https://example.com/query')
uri.query = URI.encode_www_form(params)
resp = Net::HTTP.get(uri)

# Load from the response bytes
table = Arrow::Table.load(Arrow::Buffer.new(resp))
puts "Loaded #{table.n_rows} rows"

Reading from S3

require 'arrow-dataset'

# Public S3 file
s3_uri = URI('s3://bucket/public.csv')
table = Arrow::Table.load(s3_uri)

# Private S3 file (with credentials)
require 'cgi/util'

access_key = 'YOUR_ACCESS_KEY'
secret_key = 'YOUR_SECRET_KEY'

s3_uri = URI("s3://#{CGI.escape(access_key)}:#{CGI.escape(secret_key)}@bucket/private.parquet")
table = Arrow::Table.load(s3_uri)
S3 support requires the red-arrow-dataset gem to be installed.

Loading from Multiple Files

require 'arrow-dataset'

# Load all Parquet files from a directory
table = Arrow::Table.load(
  URI("file:///path/to/parquet/folder/"),
  format: :parquet
)

puts "Loaded #{table.n_rows} total rows from multiple files"

Filtering Data

require 'arrow'

table = Arrow::Table.new(
  'name' => ['Tom', 'Max', 'Kate'],
  'age' => [22, 23, 19]
)

# Filter using slicers
filtered = table.slice { |slicer| slicer['age'] > 19 }
puts filtered.to_s
# Output:
#   name  age
# 0 Tom   22
# 1 Max   23

# Filter with range
filtered2 = table.slice { |slicer| slicer['age'].in?(19..22) }
puts filtered2.to_s

# Combine conditions with logical operators
filtered3 = table.slice do |slicer|
  (slicer['age'] > 19) & (slicer['age'] < 23)
end
puts filtered3.to_s

Grouping and Aggregation

require 'arrow'

table = Arrow::Table.new(
  'name' => ['Tom', 'Max', 'Kate', 'Tom'],
  'amount' => [10, 2, 3, 5]
)

# Group by name and sum amounts
result = table.group('name').sum('amount')
puts result.to_s
# Output:
#   name  amount
# 0 Kate  3
# 1 Max   2
# 2 Tom   15

Joining Tables

require 'arrow'

amounts = Arrow::Table.new(
  'name' => ['Tom', 'Max', 'Kate'],
  'amount' => [10, 2, 3]
)

levels = Arrow::Table.new(
  'name' => ['Max', 'Kate', 'Tom'],
  'level' => [1, 9, 5]
)

# Join on 'name' column
joined = amounts.join(levels, [:name])
puts joined.to_s
# Output:
#   name  amount  name  level
# 0 Tom   10      Tom   5
# 1 Max   2       Max   1
# 2 Kate  3       Kate  9

Using Arrow Compute Functions

require 'arrow'

table = Arrow::Table.new('values' => [1, 2, 3, 4, 5])

# Use Arrow compute functions
add = Arrow::Function.find('add')
result = add.execute([
  table['values'].data,
  table['values'].data
]).value

puts result.to_s
# Output: [2, 4, 6, 8, 10]

Troubleshooting

If the gem installation fails, try:
# Update RubyGems
gem update --system

# Install with verbose output
gem install red-arrow --verbose

# If behind a proxy
gem install red-arrow --http-proxy http://proxy.example.com:8080
If you see shared library errors:
  1. Ensure you’re using a supported platform (Linux x86_64/aarch64, macOS, Windows x86_64)
  2. Try reinstalling:
    gem uninstall red-arrow
    gem install red-arrow
    
  3. Check Ruby version: ruby --version (should be 2.5+)
Ensure red-parquet is installed:
gem list | grep parquet

# If not installed
gem install red-parquet
Then require it in your code:
require 'arrow'
require 'parquet'  # Must require explicitly
For S3 support, you need red-arrow-dataset:
gem install red-arrow-dataset
Then:
require 'arrow-dataset'

Development and Building from Source

For development or custom builds:
# Clone the repository
git clone https://github.com/apache/arrow.git
cd arrow/ruby

# Install dependencies
bundle install

# Build the gem
rake build

# Install locally
gem install pkg/red-arrow-*.gem
Building from source requires Arrow C++ to be installed on your system. Pre-built gems are recommended for most users.

Ruby Modules Overview

GemDescription
red-arrowCore Arrow bindings, table operations, IPC format
red-parquetParquet file format support
red-arrow-datasetMulti-file datasets, S3 support, partitioned data
red-arrow-cudaGPU acceleration via CUDA
red-gandivaJIT compilation for expression evaluation

Next Steps

Now that you have Red Arrow installed:

Build docs developers (and LLMs) love