Red Arrow provides the official Ruby bindings for Apache Arrow, enabling Ruby applications to work with Arrow’s high-performance columnar data structures and integrate with the broader Arrow ecosystem.
Quick Install
Install Red Arrow using RubyGems:
For Parquet file support, also install:
Installation Options
Basic Installation
Install the core Red Arrow library:
Install Red Arrow
This installs the base Apache Arrow bindings for Ruby.
Verify installation
require 'arrow'
puts Arrow :: VERSION
Installing Additional Modules
Red Arrow is split into several gems for different functionality:
Core + Parquet
Dataset API
CUDA Support
Gandiva
All modules
# Install core Arrow and Parquet support
gem install red-arrow
gem install red-parquet
Using Bundler
For Ruby projects using Bundler, add to your Gemfile:
source 'https://rubygems.org'
# Core Arrow support
gem 'red-arrow'
# Optional: Parquet support
gem 'red-parquet'
# Optional: Dataset API for S3 and multi-file support
gem 'red-arrow-dataset'
Then install:
System Requirements
Red Arrow requires:
Ruby : 2.5 or later (3.0+ recommended)
Arrow C++ library : Automatically installed via binary gems on:
Linux (x86_64, aarch64)
macOS (Intel and Apple Silicon)
Windows (x86_64)
Binary gems include pre-compiled Arrow C++ libraries, so you typically don’t need to install Arrow C++ separately.
Verifying Installation
Test your Red Arrow installation:
Create a simple test file
Create a file named test_arrow.rb: require 'arrow'
# Create a simple table
table = Arrow :: Table . new (
'name' => [ 'Alice' , 'Bob' , 'Charlie' ],
'age' => [ 25 , 30 , 35 ]
)
puts "Arrow version: #{ Arrow :: VERSION } "
puts "Table shape: #{ table. n_rows } rows, #{ table. n_columns } columns"
puts table. to_s
Run the test
You should see output showing the Arrow version and table contents.
Test Parquet support (if installed)
require 'arrow'
require 'parquet'
# Create and save a table
table = Arrow :: Table . new (
'name' => [ 'Alice' , 'Bob' ],
'age' => [ 25 , 30 ]
)
table. save ( 'test.parquet' , format: :parquet )
puts "Saved to test.parquet"
# Read it back
table2 = Arrow :: Table . load ( 'test.parquet' , format: :parquet )
puts "Loaded #{ table2. n_rows } rows"
Common Use Cases
Creating Tables from Ruby Data
require 'arrow'
# From a hash (types detected automatically)
table = Arrow :: Table . new (
'name' => [ 'Tom' , 'Max' , 'Kate' ],
'age' => [ 22 , 23 , 19 ],
'salary' => [ 50000.0 , 60000.0 , 55000.0 ]
)
puts table. to_s
Reading and Writing Files
Arrow IPC Format
Parquet Format
CSV Format
require 'arrow'
# Write to Arrow IPC format
table = Arrow :: Table . new ( 'x' => [ 1 , 2 , 3 ], 'y' => [ 4 , 5 , 6 ])
table. save ( 'data.arrow' )
# Read from Arrow IPC format
table2 = Arrow :: Table . load ( 'data.arrow' )
Loading from HTTP/Remote Sources
require 'arrow'
require 'net/http'
# Example: Loading Arrow data from a remote source
params = {
query: "SELECT id, name FROM users LIMIT 10 FORMAT Arrow" ,
user: "demo" ,
password: "" ,
database: "default"
}
uri = URI ( 'https://example.com/query' )
uri. query = URI . encode_www_form (params)
resp = Net :: HTTP . get (uri)
# Load from the response bytes
table = Arrow :: Table . load ( Arrow :: Buffer . new (resp))
puts "Loaded #{ table. n_rows } rows"
Reading from S3
require 'arrow-dataset'
# Public S3 file
s3_uri = URI ( 's3://bucket/public.csv' )
table = Arrow :: Table . load (s3_uri)
# Private S3 file (with credentials)
require 'cgi/util'
access_key = 'YOUR_ACCESS_KEY'
secret_key = 'YOUR_SECRET_KEY'
s3_uri = URI ( "s3:// #{ CGI . escape (access_key) } : #{ CGI . escape (secret_key) } @bucket/private.parquet" )
table = Arrow :: Table . load (s3_uri)
S3 support requires the red-arrow-dataset gem to be installed.
Loading from Multiple Files
require 'arrow-dataset'
# Load all Parquet files from a directory
table = Arrow :: Table . load (
URI ( "file:///path/to/parquet/folder/" ),
format: :parquet
)
puts "Loaded #{ table. n_rows } total rows from multiple files"
Filtering Data
require 'arrow'
table = Arrow :: Table . new (
'name' => [ 'Tom' , 'Max' , 'Kate' ],
'age' => [ 22 , 23 , 19 ]
)
# Filter using slicers
filtered = table. slice { | slicer | slicer[ 'age' ] > 19 }
puts filtered. to_s
# Output:
# name age
# 0 Tom 22
# 1 Max 23
# Filter with range
filtered2 = table. slice { | slicer | slicer[ 'age' ]. in? ( 19 .. 22 ) }
puts filtered2. to_s
# Combine conditions with logical operators
filtered3 = table. slice do | slicer |
(slicer[ 'age' ] > 19 ) & (slicer[ 'age' ] < 23 )
end
puts filtered3. to_s
Grouping and Aggregation
require 'arrow'
table = Arrow :: Table . new (
'name' => [ 'Tom' , 'Max' , 'Kate' , 'Tom' ],
'amount' => [ 10 , 2 , 3 , 5 ]
)
# Group by name and sum amounts
result = table. group ( 'name' ). sum ( 'amount' )
puts result. to_s
# Output:
# name amount
# 0 Kate 3
# 1 Max 2
# 2 Tom 15
Joining Tables
require 'arrow'
amounts = Arrow :: Table . new (
'name' => [ 'Tom' , 'Max' , 'Kate' ],
'amount' => [ 10 , 2 , 3 ]
)
levels = Arrow :: Table . new (
'name' => [ 'Max' , 'Kate' , 'Tom' ],
'level' => [ 1 , 9 , 5 ]
)
# Join on 'name' column
joined = amounts. join (levels, [ :name ])
puts joined. to_s
# Output:
# name amount name level
# 0 Tom 10 Tom 5
# 1 Max 2 Max 1
# 2 Kate 3 Kate 9
Using Arrow Compute Functions
require 'arrow'
table = Arrow :: Table . new ( 'values' => [ 1 , 2 , 3 , 4 , 5 ])
# Use Arrow compute functions
add = Arrow :: Function . find ( 'add' )
result = add. execute ([
table[ 'values' ]. data ,
table[ 'values' ]. data
]). value
puts result. to_s
# Output: [2, 4, 6, 8, 10]
Troubleshooting
If the gem installation fails, try: # Update RubyGems
gem update --system
# Install with verbose output
gem install red-arrow --verbose
# If behind a proxy
gem install red-arrow --http-proxy http://proxy.example.com:8080
Cannot load shared library
If you see shared library errors:
Ensure you’re using a supported platform (Linux x86_64/aarch64, macOS, Windows x86_64)
Try reinstalling:
gem uninstall red-arrow
gem install red-arrow
Check Ruby version: ruby --version (should be 2.5+)
Parquet support not working
Ensure red-parquet is installed: gem list | grep parquet
# If not installed
gem install red-parquet
Then require it in your code: require 'arrow'
require 'parquet' # Must require explicitly
For S3 support, you need red-arrow-dataset: gem install red-arrow-dataset
Then:
Development and Building from Source
For development or custom builds:
# Clone the repository
git clone https://github.com/apache/arrow.git
cd arrow/ruby
# Install dependencies
bundle install
# Build the gem
rake build
# Install locally
gem install pkg/red-arrow- * .gem
Building from source requires Arrow C++ to be installed on your system. Pre-built gems are recommended for most users.
Ruby Modules Overview
Gem Description red-arrowCore Arrow bindings, table operations, IPC format red-parquetParquet file format support red-arrow-datasetMulti-file datasets, S3 support, partitioned data red-arrow-cudaGPU acceleration via CUDA red-gandivaJIT compilation for expression evaluation
Next Steps
Now that you have Red Arrow installed: