Overview
The compute module provides functions for performing computations on Arrow data structures. These functions support both scalar operations and aggregations.Arithmetic Functions
add()
Add two arrays or scalars element-wise.First argument.
Second argument.
Memory pool for allocation.
Element-wise sum of x and y.
subtract()
Subtract two arrays or scalars element-wise.multiply()
Multiply two arrays or scalars element-wise.divide()
Divide two arrays or scalars element-wise.power()
Raise to power element-wise.Base value.
Exponent value.
Element-wise power computation.
negate()
Negate values element-wise.abs()
Compute absolute value element-wise.Aggregation Functions
sum()
Compute sum of array values.Array to aggregate.
Whether to skip null values.
Minimum number of non-null values required.
Sum of array values.
mean()
Compute arithmetic mean.min()
Find minimum value.max()
Find maximum value.count()
Count non-null values.Array to count.
Count mode: ‘only_valid’, ‘only_null’, or ‘all’.
Count of values.
variance()
Compute variance.Array to compute variance for.
Delta degrees of freedom (0 for population variance, 1 for sample variance).
Variance of array values.
stddev()
Compute standard deviation.String Functions
string_length()
Compute string length.upper()
Convert strings to uppercase.lower()
Convert strings to lowercase.match_substring()
Match substring in strings.String array to search in.
Substring pattern to match.
Whether to ignore case.
Boolean array indicating matches.
replace_substring()
Replace substring occurrences.String array.
Substring to replace.
Replacement string.
Maximum number of replacements per string.
Array with replacements made.
Filter and Selection
filter()
Filter values based on a boolean mask.Data to filter.
Boolean mask for filtering.
How to handle nulls in mask: ‘drop’ or ‘emit_null’.
Filtered data.
take()
Select values by indices.Data to select from.
Integer indices to select. Must be integer type.
Whether to check indices are in bounds.
Selected values.
index()
Find index of first occurrence of a value.Array to search in.
Value to search for.
Start index for search.
End index for search.
Index of first occurrence, or -1 if not found.
Sorting and Ranking
sort_indices()
Return indices that would sort an array.Array to get sort indices for.
Sort order: ‘ascending’ or ‘descending’.
Where to place nulls: ‘at_start’ or ‘at_end’.
Indices that would sort the array.
top_k_unstable()
Select indices of top k elements.Data to sort and get top indices from.
Number of top elements to select.
Column key names to order by (for table-like data).
Indices of top k elements.
bottom_k_unstable()
Select indices of bottom k elements.Type Conversions
cast()
Cast array values to another data type.Array to cast.
Type to cast to.
Check for overflows or unsafe conversions.
Additional casting options.
Array with values cast to target type.
Null Handling
is_null()
Check which values are null.Boolean array indicating null values.
is_valid()
Check which values are not null.fill_null()
Replace null values.Values to fill nulls in.
Value to replace nulls with.
Array with nulls replaced.
Comparison Functions
equal()
Element-wise equality.Boolean array of equality results.
not_equal()
Element-wise inequality.greater()
Element-wise greater than.greater_equal()
Element-wise greater than or equal.less()
Element-wise less than.less_equal()
Element-wise less than or equal.Logical Functions
and_()
Element-wise logical AND.or_()
Element-wise logical OR.invert()
Element-wise logical NOT.Expression API
The Expression API provides a way to build complex compute expressions.field()
Reference a field in a dataset.Field name, nested field tuple, or column index.
Field reference expression.
scalar()
Create a scalar expression.Python value to convert to expression.
Scalar value expression.
Example: Complex Expressions
Function Registry
list_functions()
List all available compute functions.Names of all available functions.
get_function()
Get a function by name.Function name.
The compute function object.
call_function()
Call a function by name.Function name to call.
Function arguments.
Function-specific options.
Function result.