Skip to main content

Compute Functions

The compute module provides functions for performing data transformations and computations on Arrow data structures.

Function Registry

All compute functions are registered in a global function registry.

GetFunctionRegistry

Returns the global function registry.
FunctionRegistry* GetFunctionRegistry()

Calling Functions

Functions can be called using the CallFunction convenience API:
Result<Datum> CallFunction(
    const std::string& func_name,
    const std::vector<Datum>& args,
    const FunctionOptions* options = nullptr,
    ExecContext* ctx = nullptr)
func_name
const std::string&
required
Name of the function to call
args
const std::vector<Datum>&
required
Input arguments (Arrays, Scalars, or ChunkedArrays)
options
const FunctionOptions*
Function-specific options
ctx
ExecContext*
Execution context (memory pool, function registry, etc.)
Returns: Result containing output data

Function Classes

Function

Base class for all compute functions.
class Function
name
const std::string&
Returns the function name
kind
Function::Kind
Returns the function kind (SCALAR, VECTOR, SCALAR_AGGREGATE, HASH_AGGREGATE, or META)
arity
const Arity&
Returns the function arity (number of required arguments)
doc
const FunctionDoc&
Returns the function documentation
Execute
Result<Datum>
args
const std::vector<Datum>&
required
Input arguments
options
const FunctionOptions*
Function options
ctx
ExecContext*
Execution context
Executes the function with kernel dispatch, batch iteration, and memory allocation handled automatically

Function Kinds

  • SCALAR: Operates element-wise on arrays/scalars. Output size matches input size
  • VECTOR: Array-to-array operations where behavior depends on entire array values
  • SCALAR_AGGREGATE: Computes scalar summary statistics from array input
  • HASH_AGGREGATE: Computes grouped summary statistics from array input and group identifiers
  • META: Dispatches to other functions, contains no kernels

Arity

Describes the number of required arguments for a function.
struct Arity {
  int num_args;
  bool is_varargs;
}
Nullary
static Arity
Function taking no arguments
Unary
static Arity
Function taking 1 argument
Binary
static Arity
Function taking 2 arguments
Ternary
static Arity
Function taking 3 arguments
VarArgs
static Arity
min_args
int
Minimum number of required arguments (default: 0)
Function taking a variable number of arguments

Common Compute Functions

Arithmetic Functions

Element-wise arithmetic operations.
// Binary operations
Result<Datum> Add(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);
Result<Datum> Subtract(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);
Result<Datum> Multiply(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);
Result<Datum> Divide(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);

// Unary operations
Result<Datum> Negate(const Datum& value, ExecContext* ctx = nullptr);
Result<Datum> Abs(const Datum& value, ExecContext* ctx = nullptr);

Comparison Functions

Element-wise comparison operations.
Result<Datum> Equal(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);
Result<Datum> NotEqual(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);
Result<Datum> Less(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);
Result<Datum> LessEqual(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);
Result<Datum> Greater(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);
Result<Datum> GreaterEqual(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);

Aggregate Functions

Compute summary statistics.
Result<Datum> Sum(const Datum& array, const ScalarAggregateOptions& options, ExecContext* ctx = nullptr);
Result<Datum> Mean(const Datum& array, const ScalarAggregateOptions& options, ExecContext* ctx = nullptr);
Result<Datum> Min(const Datum& array, const ScalarAggregateOptions& options, ExecContext* ctx = nullptr);
Result<Datum> Max(const Datum& array, const ScalarAggregateOptions& options, ExecContext* ctx = nullptr);
Result<Datum> Count(const Datum& array, const CountOptions& options, ExecContext* ctx = nullptr);

Type Casting

Cast data to different types.
Result<Datum> Cast(const Datum& value, const CastOptions& options, ExecContext* ctx = nullptr);
Result<Datum> Cast(const Datum& value, std::shared_ptr<DataType> to_type, ExecContext* ctx = nullptr);
value
const Datum&
required
Value to cast
options
const CastOptions&
required
Cast options including target type
to_type
std::shared_ptr<DataType>
required
Target data type

CastOptions

struct CastOptions {
  std::shared_ptr<DataType> to_type;
  bool allow_int_overflow = false;
  bool allow_time_truncate = false;
  bool allow_time_overflow = false;
  bool allow_decimal_truncate = false;
  bool allow_float_truncate = false;
  bool allow_invalid_utf8 = false;
}

Filter and Selection

Filter

Filter an array by a boolean selection mask.
Result<Datum> Filter(const Datum& values, const Datum& filter, const FilterOptions& options, ExecContext* ctx = nullptr);
values
const Datum&
required
Array or ChunkedArray to filter
filter
const Datum&
required
Boolean array indicating which values to keep
options
const FilterOptions&
required
Filter options

Take

Select values by indices.
Result<Datum> Take(const Datum& values, const Datum& indices, const TakeOptions& options, ExecContext* ctx = nullptr);
values
const Datum&
required
Array or ChunkedArray to select from
indices
const Datum&
required
Integer array of indices to select
options
const TakeOptions&
required
Take options (handling of out-of-bounds indices)

String Functions

String Operations

// String predicates
Result<Datum> IsAscii(const Datum& strings, ExecContext* ctx = nullptr);
Result<Datum> IsUtf8(const Datum& strings, ExecContext* ctx = nullptr);

// String transformations
Result<Datum> Utf8Upper(const Datum& strings, ExecContext* ctx = nullptr);
Result<Datum> Utf8Lower(const Datum& strings, ExecContext* ctx = nullptr);
Result<Datum> Utf8Reverse(const Datum& strings, ExecContext* ctx = nullptr);

// String trimming
Result<Datum> Utf8Trim(const Datum& strings, const TrimOptions& options, ExecContext* ctx = nullptr);
Result<Datum> Utf8LTrim(const Datum& strings, const TrimOptions& options, ExecContext* ctx = nullptr);
Result<Datum> Utf8RTrim(const Datum& strings, const TrimOptions& options, ExecContext* ctx = nullptr);

Execution Context

ExecContext

Provides execution context including memory pool and function registry.
class ExecContext {
 public:
  explicit ExecContext(MemoryPool* pool = default_memory_pool(),
                       FunctionRegistry* func_registry = nullptr);
  
  MemoryPool* memory_pool() const;
  FunctionRegistry* func_registry() const;
}
memory_pool
MemoryPool*
Returns the memory pool for allocations
func_registry
FunctionRegistry*
Returns the function registry for looking up functions

Build docs developers (and LLMs) love