Basic Usage
This guide covers fundamental operations with Red Arrow, from creating tables to performing data transformations.Getting Started
First, require the Arrow library:Creating Tables
From Ruby Hash
The simplest way to create a table is from a Ruby hash. Data types are automatically detected:From Arrays
Create tables using Arrow array types:With Explicit Schema
Define schema explicitly for precise control:From Raw Records
Create tables from arrays of records:Loading and Saving Data
Loading from Files
Loading from S3
Withred-arrow-dataset, load directly from S3:
Loading from HTTP
Saving Tables
Accessing Data
Column Access
Row Access
Filtering Data
Using Slicer
Red Arrow provides a powerful slicer syntax for filtering:Combining Conditions
Use logical operators to combine filters:Hash-based Filtering
Array-based Filtering
Grouping and Aggregation
Perform group-by operations:Joining Tables
Join tables using common keys:Join Types
Different Key Names
Transforming Data
Adding Columns
Removing Columns
Slicing by Range
Working with Compute Functions
Access Arrow’s compute functions directly:- Arithmetic:
add,subtract,multiply,divide - Comparison:
equal,greater,less,greater_equal,less_equal - String:
string_length,starts_with,ends_with - Statistical:
sum,mean,min,max,stddev
Reading and Writing Streams
Writing Streams
Reading Streams
Memory Management
Packing Tables
Optimize memory layout by packing chunked arrays:Memory-Mapped Files
Use memory mapping for efficient file access:Type System
Red Arrow supports all Arrow data types:Numeric Types
String and Binary
Temporal Types
Other Types
Best Practices
- Use explicit schemas for production code to ensure data consistency
- Pack tables when memory is constrained or before serialization
- Use memory-mapped I/O for large files
- Leverage columnar operations instead of row-by-row processing
- Batch operations when possible for better performance
- Close resources explicitly or use blocks for automatic cleanup
Next Steps
- Explore the Ruby source code for advanced usage
- Check out the Apache Arrow documentation for deeper understanding
- Join the Apache Arrow community for support