Skip to main content
Built-in functions provide essential operations for aggregation, conditionals, and working with structured data. All functions are available directly via fc.*.

Aggregation Functions

sum

Aggregate function: returns the sum of all values in the specified column.
fc.sum(column: ColumnOrName) -> Column

sum_distinct

Aggregate function: returns the sum of distinct numeric values in the specified column.
fc.sum_distinct(column: ColumnOrName) -> Column

Example

df.group_by(fc.col("k")).agg(
    fc.sum_distinct(fc.col("v")).alias("sum_distinct_v")
)

avg / mean

Aggregate function: returns the average (mean) of all values in the specified column.
fc.avg(column: ColumnOrName) -> Column
fc.mean(column: ColumnOrName) -> Column  # Alias for avg

min

Aggregate function: returns the minimum value in the specified column.
fc.min(column: ColumnOrName) -> Column

max

Aggregate function: returns the maximum value in the specified column.
fc.max(column: ColumnOrName) -> Column

count

Aggregate function: returns the count of non-null values in the specified column.
fc.count(column: ColumnOrName) -> Column

Example

fc.count("*")  # Count all rows
fc.count("column_name")  # Count non-null values

count_distinct

Aggregate function: returns the number of distinct non-null rows across one or more columns.
fc.count_distinct(*cols: ColumnOrName) -> Column
*cols
ColumnOrName
required
One or more columns or column names to include in the distinct count.
Any row where one or more inputs is null is ignored.

Example

df.group_by(fc.col("k")).agg(
    fc.count_distinct(fc.col("v")).alias("num_unique_v")
)

approx_count_distinct

Aggregate function: returns an approximate count (HyperLogLog++) of distinct non-null values.
fc.approx_count_distinct(column: ColumnOrName) -> Column
column
ColumnOrName
required
Column or column name to approximately count distinct values in. Cannot be a StructType column.

collect_list / array_agg

Aggregate function: collects all values from the specified column into a list.
fc.collect_list(column: ColumnOrName) -> Column
fc.array_agg(column: ColumnOrName) -> Column  # Alias

first

Aggregate function: returns the first non-null value in the specified column.
fc.first(column: ColumnOrName) -> Column

stddev

Aggregate function: returns the sample standard deviation of the specified column.
fc.stddev(column: ColumnOrName) -> Column

Conditional Functions

when

Evaluates a conditional expression (like if-then).
fc.when(condition: Column, value: Column) -> Column
condition
Column
required
Boolean expression to test.
value
Column
required
Value to return when condition is True.
return
Column
A when expression that can be chained with more conditions using .when() or finished with .otherwise().

Examples

df.select(fc.when(fc.col("age") >= 18, fc.lit("adult")))

coalesce

Returns the first non-null value from the given columns for each row.
fc.coalesce(*cols: ColumnOrName) -> Column
*cols
ColumnOrName
required
Column expressions or column names to evaluate.
return
Column
A Column expression containing the first non-null value from the input columns.

Example

df.select(fc.coalesce("col1", "col2", "col3"))

greatest

Returns the greatest value from the given columns for each row.
fc.greatest(*cols: ColumnOrName) -> Column
*cols
ColumnOrName
required
Column expressions or column names to evaluate (minimum 2).
All arguments must be of the same primitive type.

Example

df.select(fc.greatest("col1", "col2", "col3"))

least

Returns the least value from the given columns for each row.
fc.least(*cols: ColumnOrName) -> Column
*cols
ColumnOrName
required
Column expressions or column names to evaluate (minimum 2).
All arguments must be of the same primitive type.

Example

df.select(fc.least("col1", "col2", "col3"))

Data Structure Functions

struct

Creates a new struct column from multiple input columns.
fc.struct(*args: Union[ColumnOrName, List[ColumnOrName], Tuple[ColumnOrName, ...]]) -> Column
*args
Union[ColumnOrName, List, Tuple]
required
Columns or column names to combine into a struct. Can be individual arguments, lists, or tuples.
return
Column
A Column expression representing a struct containing the input columns.

array

Creates a new array column from multiple input columns.
fc.array(*args: Union[ColumnOrName, List[ColumnOrName], Tuple[ColumnOrName, ...]]) -> Column
*args
Union[ColumnOrName, List, Tuple]
required
Columns or column names to combine into an array. Can be individual arguments, lists, or tuples.
return
Column
A Column expression representing an array containing values from the input columns.

flatten

Flattens an array of arrays into a single array (one level deep).
fc.flatten(column: ColumnOrName) -> Column
column
ColumnOrName
required
Column or column name containing arrays of arrays.
return
Column
A Column with flattened arrays (one level deep).

Example

df.select(fc.flatten("nested"))
# Input: [[1, 2], [3, 4]]
# Output: [1, 2, 3, 4]

User-Defined Functions

udf

A decorator or function for creating user-defined functions (UDFs) that can be applied to DataFrame rows.
fc.udf(f: Optional[Callable] = None, *, return_type: DataType)
f
Optional[Callable]
Python function to convert to UDF.
return_type
DataType
required
Expected return type of the UDF.
UDFs cannot be serialized and are not supported in cloud execution. For cloud compatibility, use built-in fenic functions instead.

Examples

@fc.udf(return_type=IntegerType)
def add_one(x: int):
    return x + 1

# Or
add_one = fc.udf(lambda x: x + 1, return_type=IntegerType)

async_udf

A decorator for creating async user-defined functions (UDFs) with configurable concurrency and retries.
fc.async_udf(
    f: Optional[Callable[..., Awaitable[Any]]] = None,
    *,
    return_type: DataType,
    max_concurrency: int = 10,
    timeout_seconds: float = 30,
    num_retries: int = 0,
)
f
Optional[Callable]
Async function to convert to UDF.
return_type
DataType
required
Expected return type of the UDF.
max_concurrency
int
default:"10"
Maximum number of concurrent executions.
timeout_seconds
float
default:"30"
Per-item timeout in seconds.
num_retries
int
default:"0"
Number of retries for failed items.

Examples

@fc.async_udf(return_type=IntegerType)
async def slow_add(x: int, y: int) -> int:
    await asyncio.sleep(1)
    return x + y

df = df.select(slow_add(fc.col("x"), fc.col("y")).alias("slow_sum"))

Sorting Functions

asc

Mark this column for ascending sort order with nulls first.
fc.asc(column: ColumnOrName) -> Column

asc_nulls_first

Alias for asc().
fc.asc_nulls_first(column: ColumnOrName) -> Column

asc_nulls_last

Mark this column for ascending sort order with nulls last.
fc.asc_nulls_last(column: ColumnOrName) -> Column

desc

Mark this column for descending sort order with nulls first.
fc.desc(column: ColumnOrName) -> Column

desc_nulls_first

Alias for desc().
fc.desc_nulls_first(column: ColumnOrName) -> Column

desc_nulls_last

Mark this column for descending sort order with nulls last.
fc.desc_nulls_last(column: ColumnOrName) -> Column

Build docs developers (and LLMs) love