Built-in Functions

Built-in functions provide essential operations for aggregation, conditionals, and working with structured data. All functions are available directly via fc.*.

Aggregation Functions

sum

Aggregate function: returns the sum of all values in the specified column.

fc.sum(column: ColumnOrName) -> Column

sum_distinct

Aggregate function: returns the sum of distinct numeric values in the specified column.

fc.sum_distinct(column: ColumnOrName) -> Column

Example

df.group_by(fc.col("k")).agg(
    fc.sum_distinct(fc.col("v")).alias("sum_distinct_v")
)

avg / mean

Aggregate function: returns the average (mean) of all values in the specified column.

fc.avg(column: ColumnOrName) -> Column
fc.mean(column: ColumnOrName) -> Column  # Alias for avg

min

Aggregate function: returns the minimum value in the specified column.

fc.min(column: ColumnOrName) -> Column

max

Aggregate function: returns the maximum value in the specified column.

fc.max(column: ColumnOrName) -> Column

count

Aggregate function: returns the count of non-null values in the specified column.

fc.count(column: ColumnOrName) -> Column

Example

fc.count("*")  # Count all rows
fc.count("column_name")  # Count non-null values

count_distinct

Aggregate function: returns the number of distinct non-null rows across one or more columns.

fc.count_distinct(*cols: ColumnOrName) -> Column

*cols

ColumnOrName

required

One or more columns or column names to include in the distinct count.

Any row where one or more inputs is null is ignored.

Example

df.group_by(fc.col("k")).agg(
    fc.count_distinct(fc.col("v")).alias("num_unique_v")
)

approx_count_distinct

Aggregate function: returns an approximate count (HyperLogLog++) of distinct non-null values.

fc.approx_count_distinct(column: ColumnOrName) -> Column

column

ColumnOrName

required

Column or column name to approximately count distinct values in. Cannot be a StructType column.

collect_list / array_agg

Aggregate function: collects all values from the specified column into a list.

fc.collect_list(column: ColumnOrName) -> Column
fc.array_agg(column: ColumnOrName) -> Column  # Alias

first

Aggregate function: returns the first non-null value in the specified column.

fc.first(column: ColumnOrName) -> Column

stddev

Aggregate function: returns the sample standard deviation of the specified column.

fc.stddev(column: ColumnOrName) -> Column

Conditional Functions

when

Evaluates a conditional expression (like if-then).

fc.when(condition: Column, value: Column) -> Column

condition

Column

required

Boolean expression to test.

value

Column

required

Value to return when condition is True.

return

Column

A when expression that can be chained with more conditions using .when() or finished with .otherwise().

Examples

df.select(fc.when(fc.col("age") >= 18, fc.lit("adult")))

coalesce

Returns the first non-null value from the given columns for each row.

fc.coalesce(*cols: ColumnOrName) -> Column

*cols

ColumnOrName

required

Column expressions or column names to evaluate.

return

Column

A Column expression containing the first non-null value from the input columns.

Example

df.select(fc.coalesce("col1", "col2", "col3"))

greatest

Returns the greatest value from the given columns for each row.

fc.greatest(*cols: ColumnOrName) -> Column

*cols

ColumnOrName

required

Column expressions or column names to evaluate (minimum 2).

All arguments must be of the same primitive type.

Example

df.select(fc.greatest("col1", "col2", "col3"))

least

Returns the least value from the given columns for each row.

fc.least(*cols: ColumnOrName) -> Column

*cols

ColumnOrName

required

Column expressions or column names to evaluate (minimum 2).

All arguments must be of the same primitive type.

Example

df.select(fc.least("col1", "col2", "col3"))

Data Structure Functions

struct

Creates a new struct column from multiple input columns.

fc.struct(*args: Union[ColumnOrName, List[ColumnOrName], Tuple[ColumnOrName, ...]]) -> Column

*args

Union[ColumnOrName, List, Tuple]

required

Columns or column names to combine into a struct. Can be individual arguments, lists, or tuples.

return

Column

A Column expression representing a struct containing the input columns.

array

Creates a new array column from multiple input columns.

fc.array(*args: Union[ColumnOrName, List[ColumnOrName], Tuple[ColumnOrName, ...]]) -> Column

*args

Union[ColumnOrName, List, Tuple]

required

Columns or column names to combine into an array. Can be individual arguments, lists, or tuples.

return

Column

A Column expression representing an array containing values from the input columns.

flatten

Flattens an array of arrays into a single array (one level deep).

fc.flatten(column: ColumnOrName) -> Column

column

ColumnOrName

required

Column or column name containing arrays of arrays.

return

Column

A Column with flattened arrays (one level deep).

Example

df.select(fc.flatten("nested"))
# Input: [[1, 2], [3, 4]]
# Output: [1, 2, 3, 4]

User-Defined Functions

udf

A decorator or function for creating user-defined functions (UDFs) that can be applied to DataFrame rows.

fc.udf(f: Optional[Callable] = None, *, return_type: DataType)

Optional[Callable]

Python function to convert to UDF.

return_type

DataType

required

Expected return type of the UDF.

UDFs cannot be serialized and are not supported in cloud execution. For cloud compatibility, use built-in fenic functions instead.

Examples

@fc.udf(return_type=IntegerType)
def add_one(x: int):
    return x + 1

# Or
add_one = fc.udf(lambda x: x + 1, return_type=IntegerType)

async_udf

A decorator for creating async user-defined functions (UDFs) with configurable concurrency and retries.

fc.async_udf(
    f: Optional[Callable[..., Awaitable[Any]]] = None,
    *,
    return_type: DataType,
    max_concurrency: int = 10,
    timeout_seconds: float = 30,
    num_retries: int = 0,
)

Optional[Callable]

Async function to convert to UDF.

return_type

DataType

required

Expected return type of the UDF.

max_concurrency

int

default:"10"

Maximum number of concurrent executions.

timeout_seconds

float

default:"30"

Per-item timeout in seconds.

num_retries

int

default:"0"

Number of retries for failed items.

Examples

@fc.async_udf(return_type=IntegerType)
async def slow_add(x: int, y: int) -> int:
    await asyncio.sleep(1)
    return x + y

df = df.select(slow_add(fc.col("x"), fc.col("y")).alias("slow_sum"))

Sorting Functions

asc

Mark this column for ascending sort order with nulls first.

fc.asc(column: ColumnOrName) -> Column

asc_nulls_first

Alias for asc().

fc.asc_nulls_first(column: ColumnOrName) -> Column

asc_nulls_last

Mark this column for ascending sort order with nulls last.

fc.asc_nulls_last(column: ColumnOrName) -> Column

desc

Mark this column for descending sort order with nulls first.

fc.desc(column: ColumnOrName) -> Column

desc_nulls_first

Alias for desc().

fc.desc_nulls_first(column: ColumnOrName) -> Column

desc_nulls_last

Mark this column for descending sort order with nulls last.

fc.desc_nulls_last(column: ColumnOrName) -> Column

Core

Functions

I/O

Types

Configuration

MCP

​Aggregation Functions

​sum

​sum_distinct

​Example

​avg / mean

​min

​max

​count

​Example

​count_distinct

​Example

​approx_count_distinct

​collect_list / array_agg

​first

​stddev

​Conditional Functions

​when

​Examples

​coalesce

​Example

​greatest

​Example

​least

​Example

​Data Structure Functions

​struct

​array

​flatten

​Example

​User-Defined Functions

​udf

​Examples

​async_udf

​Examples

​Sorting Functions

​asc

​asc_nulls_first

​asc_nulls_last

​desc

​desc_nulls_first

​desc_nulls_last

Build docs developers (and LLMs) love

Aggregation Functions

sum

sum_distinct

Example

avg / mean

min

max

count

Example

count_distinct

Example

approx_count_distinct

collect_list / array_agg

first

stddev

Conditional Functions

when

Examples

coalesce

Example

greatest

Example

least

Example

Data Structure Functions

struct

array

flatten

Example

User-Defined Functions

udf

Examples

async_udf

Examples

Sorting Functions

asc

asc_nulls_first

asc_nulls_last

desc

desc_nulls_first

desc_nulls_last