Function Categories
Arrow compute functions are organized into several categories:- Scalar functions: Element-wise operations that produce output of the same size as input
- Vector functions: Operations that may produce different-sized output
- Aggregate functions: Functions that compute summary statistics
- Hash aggregate functions: Grouped aggregations using hash tables
Using Compute Functions
Arithmetic Operations
- C++
- Python
Comparison and Filtering
- C++
- Python
Aggregate Functions
- C++
- Python
String Operations
- C++
- Python
Function Registry
All compute functions are registered in a global function registry:- C++
- Python
Custom Execution Context
You can customize function execution with anExecContext:
- C++
- Python
Performance Tips
- Use vectorized operations: Compute functions are optimized for vectorized execution
- Batch processing: Process data in large batches to amortize overhead
- Avoid repeated allocations: Reuse buffers when possible
- Choose appropriate options: Configure
skip_nulls,check_overflowbased on your data
Next Steps
- Learn about Expressions and Filters for building complex operations
- Explore Acero Query Engine for query execution
- See Working with Datasets for large-scale data processing