Skip to main content
The Column class represents a column expression that can be used in DataFrame operations. It provides methods for accessing, transforming, and combining column data.
Direct construction using Column() is not allowed. Create Column references using the col() function or DataFrame column access methods.

Creating Columns

Use the col() function to create column references:
from fenic.api.functions import col

# Reference a column by name
age_col = col("age")

# Use in DataFrame operations
df.select(col("name"), col("age"))

Column Access

get_item / []

col.get_item(key: Union[str, int, Column]) -> Column
col[key]
Access an item in a struct or array column.
key
Union[str, int, Column]
required
  • For arrays: integer index or Column expression evaluating to integer
  • For structs: literal field name (string)
Column
Column
A Column representing the accessed item
# Array access
df.select(col("array_column")[0])
df.select(col("array_column").get_item(0))

# Struct access
df.select(col("struct_column")["field_name"])
df.select(col("struct_column").get_item("field_name"))

Dot notation for structs

col.field_name -> Column
Access struct fields using dot notation.
# Access struct field
df.select(col("struct_column").field_name)

Column Transformation

alias

col.alias(name: str) -> Column
Create an alias for this column.
name
str
required
The alias name to assign
Column
Column
Column with the specified alias
# Rename column
df.select(col("original_name").alias("new_name"))

# Name calculated column
df.select((col("price") * col("quantity")).alias("total_value"))

cast

col.cast(data_type: DataType) -> Column
Cast the column to a new data type.
data_type
DataType
required
The target DataType to cast to
Column
Column
A Column representing the casted expression
from fenic.core.types import StringType, IntegerType

# Cast integer to string
df.select(col("int_col").cast(StringType))

# Cast string to integer
df.select(col("str_col").cast(IntegerType))
Type Casting Rules:Primitive types:
  • Numeric types can be cast between each other and to/from StringType
  • Date/Timestamp can be cast between each other and to/from numeric/string types
  • BooleanType can be cast to/from numeric and string types
Complex types:
  • ArrayType can only be cast to another ArrayType (with castable element types)
  • StructType can only be cast to another StructType (with matching/castable fields)

Sorting

asc / desc

col.asc() -> Column
col.desc() -> Column
Mark this column for ascending or descending sort order.
Column
Column
A sort expression with the specified order
# Sort ascending
df.sort(col("age").asc())

# Sort descending
df.sort(col("age").desc())

asc_nulls_last / desc_nulls_last

col.asc_nulls_last() -> Column
col.desc_nulls_last() -> Column
Sort with nulls appearing last.
Column
Column
A sort expression with nulls positioned last
df.sort(col("age").asc_nulls_last())
df.sort(col("age").desc_nulls_last())

String Operations

contains

col.contains(other: Union[str, Column]) -> Column
Check if the column contains a substring.
other
Union[str, Column]
required
The substring to search for (string or column expression)
Column
Column
A boolean column indicating whether each value contains the substring
# Find rows with substring
df.filter(col("name").contains("john"))

# Dynamic pattern from another column
df.filter(col("text").contains(col("pattern")))

contains_any

col.contains_any(others: List[str], case_insensitive: bool = True) -> Column
Check if the column contains any of the specified substrings.
others
List[str]
required
List of substrings to search for
case_insensitive
bool
default:"True"
Whether to perform case-insensitive matching
Column
Column
A boolean column indicating whether each value contains any substring
# Case-insensitive (default)
df.filter(col("name").contains_any(["john", "jane"]))

# Case-sensitive
df.filter(col("name").contains_any(["John", "Jane"], case_insensitive=False))

starts_with / ends_with

col.starts_with(other: Union[str, Column]) -> Column
col.ends_with(other: Union[str, Column]) -> Column
Check if the column starts or ends with a substring.
other
Union[str, Column]
required
The substring to check for
Column
Column
A boolean column indicating whether each value starts/ends with the substring
# Check prefix
df.filter(col("name").starts_with("Mr"))

# Check suffix
df.filter(col("email").ends_with("@gmail.com"))

like / ilike

col.like(other: Union[str, Column]) -> Column
col.ilike(other: Union[str, Column]) -> Column
Check if the column matches a SQL LIKE pattern.
other
Union[str, Column]
required
The SQL LIKE pattern (% matches any sequence, _ matches single character)
Column
Column
A boolean column indicating whether each value matches the pattern
# Case-sensitive LIKE
df.filter(col("name").like("J%n"))

# Case-insensitive LIKE
df.filter(col("name").ilike("j%n"))

rlike

col.rlike(other: Union[str, Column]) -> Column
Check if the column matches a regular expression pattern.
other
Union[str, Column]
required
The regular expression pattern to match against
Column
Column
A boolean column indicating whether each value matches the pattern
# Match phone number pattern
df.filter(col("phone").rlike(r"^\d{3}-\d{3}-\d{4}$"))

# Match word boundaries
df.filter(col("text").rlike(r"\bhello\b"))

Null Checking

is_null / is_not_null

col.is_null() -> Column
col.is_not_null() -> Column
Check if the column contains NULL or non-NULL values.
Column
Column
A boolean column indicating NULL status
# Filter NULL values
df.filter(col("some_column").is_null())

# Filter non-NULL values
df.filter(col("some_column").is_not_null())

# Complex condition
df.filter(col("col1").is_null() | (col("col2") > 100))

Membership

is_in

col.is_in(other: Union[List[Any], Column]) -> Column
Check if the column is in a list of values or a column expression.
other
Union[List[Any], Column]
required
A list of values or a Column expression
Column
Column
A boolean column indicating membership
# Check against list
df.filter(col("name").is_in(["Alice", "Bob"]))

# Check against another column
df.filter(col("name").is_in(col("other_column")))

Conditional Expressions

when / otherwise

col.when(condition: Column, value: Column) -> Column
col.otherwise(value: Column) -> Column
Evaluates conditions and returns values (similar to SQL CASE WHEN).
condition
Column
required
Boolean expression to test
value
Column
required
Value to return when condition is True
Column
Column
Conditional expression
from fenic.api.functions import when, lit

# Multi-condition expression
result = (
    when(col("age") < 18, lit("minor"))
    .when(col("age") < 65, lit("adult"))
    .when(col("age") >= 65, lit("senior"))
    .otherwise(lit("unknown"))
)

df.select(result.alias("age_category"))
  • when() can only be called on when expressions, not regular columns
  • All branches must return the same type
  • Conditions are evaluated in order - first True condition wins
  • If otherwise() is not called, unmatched rows return null

Operators

Comparison Operators

col > other   # Greater than
col >= other  # Greater than or equal
col < other   # Less than
col <= other  # Less than or equal
col == other  # Equal
col != other  # Not equal
# Numeric comparison
df.filter(col("age") > 25)
df.filter(col("salary") >= 50000)

# String comparison
df.filter(col("name") == "Alice")

Logical Operators

col1 & col2   # Logical AND
col1 | col2   # Logical OR
~col          # Logical NOT
# Multiple conditions
df.filter((col("age") > 25) & (col("age") < 65))
df.filter((col("city") == "NYC") | (col("city") == "LA"))
df.filter(~col("is_active"))
Use &, |, ~ for logical operations, NOT and, or, not. Python’s boolean operators cannot be overloaded.

Arithmetic Operators

col + other   # Addition
col - other   # Subtraction
col * other   # Multiplication
col / other   # Division
# Arithmetic operations
df.select(col("price") * col("quantity"))
df.select(col("total") / col("count"))
df.select(col("age") + 1)

Type Alias

ColumnOrName = Union[Column, str]
Many DataFrame methods accept either a Column object or a string column name.
# Both are valid
df.select(col("name"))  # Column object
df.select("name")       # String column name

Aliases

  • getItemget_item

Build docs developers (and LLMs) love