Compute Functions
The compute module provides functions for performing data transformations and computations on Arrow data structures.
Function Registry
All compute functions are registered in a global function registry.
GetFunctionRegistry
Returns the global function registry.
FunctionRegistry* GetFunctionRegistry()
Calling Functions
Functions can be called using the CallFunction convenience API:
Result<Datum> CallFunction(
const std::string& func_name,
const std::vector<Datum>& args,
const FunctionOptions* options = nullptr,
ExecContext* ctx = nullptr)
func_name
const std::string&
required
Name of the function to call
args
const std::vector<Datum>&
required
Input arguments (Arrays, Scalars, or ChunkedArrays)
Function-specific options
Execution context (memory pool, function registry, etc.)
Returns: Result containing output data
Function Classes
Function
Base class for all compute functions.
Returns the function name
Returns the function kind (SCALAR, VECTOR, SCALAR_AGGREGATE, HASH_AGGREGATE, or META)
Returns the function arity (number of required arguments)
Returns the function documentation
args
const std::vector<Datum>&
required
Input arguments
Executes the function with kernel dispatch, batch iteration, and memory allocation handled automatically
Function Kinds
- SCALAR: Operates element-wise on arrays/scalars. Output size matches input size
- VECTOR: Array-to-array operations where behavior depends on entire array values
- SCALAR_AGGREGATE: Computes scalar summary statistics from array input
- HASH_AGGREGATE: Computes grouped summary statistics from array input and group identifiers
- META: Dispatches to other functions, contains no kernels
Arity
Describes the number of required arguments for a function.
struct Arity {
int num_args;
bool is_varargs;
}
Function taking no arguments
Function taking 1 argument
Function taking 2 arguments
Function taking 3 arguments
Minimum number of required arguments (default: 0)
Function taking a variable number of arguments
Common Compute Functions
Arithmetic Functions
Element-wise arithmetic operations.
// Binary operations
Result<Datum> Add(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);
Result<Datum> Subtract(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);
Result<Datum> Multiply(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);
Result<Datum> Divide(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);
// Unary operations
Result<Datum> Negate(const Datum& value, ExecContext* ctx = nullptr);
Result<Datum> Abs(const Datum& value, ExecContext* ctx = nullptr);
Comparison Functions
Element-wise comparison operations.
Result<Datum> Equal(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);
Result<Datum> NotEqual(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);
Result<Datum> Less(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);
Result<Datum> LessEqual(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);
Result<Datum> Greater(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);
Result<Datum> GreaterEqual(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);
Aggregate Functions
Compute summary statistics.
Result<Datum> Sum(const Datum& array, const ScalarAggregateOptions& options, ExecContext* ctx = nullptr);
Result<Datum> Mean(const Datum& array, const ScalarAggregateOptions& options, ExecContext* ctx = nullptr);
Result<Datum> Min(const Datum& array, const ScalarAggregateOptions& options, ExecContext* ctx = nullptr);
Result<Datum> Max(const Datum& array, const ScalarAggregateOptions& options, ExecContext* ctx = nullptr);
Result<Datum> Count(const Datum& array, const CountOptions& options, ExecContext* ctx = nullptr);
Type Casting
Cast data to different types.
Result<Datum> Cast(const Datum& value, const CastOptions& options, ExecContext* ctx = nullptr);
Result<Datum> Cast(const Datum& value, std::shared_ptr<DataType> to_type, ExecContext* ctx = nullptr);
options
const CastOptions&
required
Cast options including target type
to_type
std::shared_ptr<DataType>
required
Target data type
CastOptions
struct CastOptions {
std::shared_ptr<DataType> to_type;
bool allow_int_overflow = false;
bool allow_time_truncate = false;
bool allow_time_overflow = false;
bool allow_decimal_truncate = false;
bool allow_float_truncate = false;
bool allow_invalid_utf8 = false;
}
Filter and Selection
Filter
Filter an array by a boolean selection mask.
Result<Datum> Filter(const Datum& values, const Datum& filter, const FilterOptions& options, ExecContext* ctx = nullptr);
Array or ChunkedArray to filter
Boolean array indicating which values to keep
options
const FilterOptions&
required
Filter options
Take
Select values by indices.
Result<Datum> Take(const Datum& values, const Datum& indices, const TakeOptions& options, ExecContext* ctx = nullptr);
Array or ChunkedArray to select from
Integer array of indices to select
options
const TakeOptions&
required
Take options (handling of out-of-bounds indices)
String Functions
String Operations
// String predicates
Result<Datum> IsAscii(const Datum& strings, ExecContext* ctx = nullptr);
Result<Datum> IsUtf8(const Datum& strings, ExecContext* ctx = nullptr);
// String transformations
Result<Datum> Utf8Upper(const Datum& strings, ExecContext* ctx = nullptr);
Result<Datum> Utf8Lower(const Datum& strings, ExecContext* ctx = nullptr);
Result<Datum> Utf8Reverse(const Datum& strings, ExecContext* ctx = nullptr);
// String trimming
Result<Datum> Utf8Trim(const Datum& strings, const TrimOptions& options, ExecContext* ctx = nullptr);
Result<Datum> Utf8LTrim(const Datum& strings, const TrimOptions& options, ExecContext* ctx = nullptr);
Result<Datum> Utf8RTrim(const Datum& strings, const TrimOptions& options, ExecContext* ctx = nullptr);
Execution Context
ExecContext
Provides execution context including memory pool and function registry.
class ExecContext {
public:
explicit ExecContext(MemoryPool* pool = default_memory_pool(),
FunctionRegistry* func_registry = nullptr);
MemoryPool* memory_pool() const;
FunctionRegistry* func_registry() const;
}
Returns the memory pool for allocations
Returns the function registry for looking up functions