fc.*.
Aggregation Functions
sum
Aggregate function: returns the sum of all values in the specified column.sum_distinct
Aggregate function: returns the sum of distinct numeric values in the specified column.Example
avg / mean
Aggregate function: returns the average (mean) of all values in the specified column.min
Aggregate function: returns the minimum value in the specified column.max
Aggregate function: returns the maximum value in the specified column.count
Aggregate function: returns the count of non-null values in the specified column.Example
count_distinct
Aggregate function: returns the number of distinct non-null rows across one or more columns.One or more columns or column names to include in the distinct count.
Any row where one or more inputs is null is ignored.
Example
approx_count_distinct
Aggregate function: returns an approximate count (HyperLogLog++) of distinct non-null values.Column or column name to approximately count distinct values in. Cannot be a StructType column.
collect_list / array_agg
Aggregate function: collects all values from the specified column into a list.first
Aggregate function: returns the first non-null value in the specified column.stddev
Aggregate function: returns the sample standard deviation of the specified column.Conditional Functions
when
Evaluates a conditional expression (like if-then).Boolean expression to test.
Value to return when condition is True.
A when expression that can be chained with more conditions using
.when() or finished with .otherwise().Examples
coalesce
Returns the first non-null value from the given columns for each row.Column expressions or column names to evaluate.
A Column expression containing the first non-null value from the input columns.
Example
greatest
Returns the greatest value from the given columns for each row.Column expressions or column names to evaluate (minimum 2).
All arguments must be of the same primitive type.
Example
least
Returns the least value from the given columns for each row.Column expressions or column names to evaluate (minimum 2).
All arguments must be of the same primitive type.
Example
Data Structure Functions
struct
Creates a new struct column from multiple input columns.Columns or column names to combine into a struct. Can be individual arguments, lists, or tuples.
A Column expression representing a struct containing the input columns.
array
Creates a new array column from multiple input columns.Columns or column names to combine into an array. Can be individual arguments, lists, or tuples.
A Column expression representing an array containing values from the input columns.
flatten
Flattens an array of arrays into a single array (one level deep).Column or column name containing arrays of arrays.
A Column with flattened arrays (one level deep).
Example
User-Defined Functions
udf
A decorator or function for creating user-defined functions (UDFs) that can be applied to DataFrame rows.Python function to convert to UDF.
Expected return type of the UDF.
Examples
async_udf
A decorator for creating async user-defined functions (UDFs) with configurable concurrency and retries.Async function to convert to UDF.
Expected return type of the UDF.
Maximum number of concurrent executions.
Per-item timeout in seconds.
Number of retries for failed items.
Examples
Sorting Functions
asc
Mark this column for ascending sort order with nulls first.asc_nulls_first
Alias forasc().
asc_nulls_last
Mark this column for ascending sort order with nulls last.desc
Mark this column for descending sort order with nulls first.desc_nulls_first
Alias fordesc().
