Skip to main content
Schemas define the structure of DataFrames by specifying column names and their data types.

Schema

Represents the complete schema of a DataFrame.
column_fields
List[ColumnField]
required
An ordered list of ColumnField objects that define the structure of the DataFrame.

Methods

column_names()

Get a list of all column names in the schema. Returns: List[str] - List of column names
schema = fc.Schema([
    fc.ColumnField("id", fc.IntegerType),
    fc.ColumnField("name", fc.StringType)
])

columns = schema.column_names()
# Returns: ['id', 'name']

Examples

import fenic as fc
from fenic import Schema, ColumnField, IntegerType, StringType

schema = Schema([
    ColumnField("id", IntegerType),
    ColumnField("name", StringType)
])

ColumnField

Represents a typed column in a DataFrame schema.
name
str
required
The name of the column.
data_type
DataType
required
The data type of the column.

Examples

from fenic import ColumnField, StringType

field = ColumnField("name", StringType)

DatasetMetadata

Contains metadata about a dataset (table or view), including its schema and description.
schema
Schema
required
The schema of the dataset.
description
str | None
required
The natural language description of the dataset’s contents.

Example

import fenic as fc

session = fc.Session.get_or_create()

# Create a table
schema = fc.Schema([
    fc.ColumnField("id", fc.IntegerType),
    fc.ColumnField("name", fc.StringType)
])
session.catalog.create_table(
    "my_table",
    schema,
    description="A table containing user information"
)

# Get metadata
metadata = session.catalog.describe_table("my_table")
print(metadata.schema)        # Schema with id and name columns
print(metadata.description)   # "A table containing user information"

Schema Inference

Fenic automatically infers schemas when reading data:

CSV Files

# Schema is automatically inferred from CSV headers and data
df = session.read.csv("data.csv")

# View inferred schema
df.print_schema()

Parquet Files

# Schema is read from Parquet metadata
df = session.read.parquet("data.parquet")

df.print_schema()

Python Data

# Schema is inferred from Python types
df = session.create_dataframe({
    "id": [1, 2, 3],
    "name": ["Alice", "Bob", "Charlie"],
    "scores": [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
})

df.print_schema()
# Schema(
#     ColumnField(name='id', data_type=IntegerType),
#     ColumnField(name='name', data_type=StringType),
#     ColumnField(name='scores', data_type=ArrayType(element_type=IntegerType))
# )

Explicit Schema Specification

You can provide explicit schemas when reading CSV files:
import fenic as fc
from fenic import Schema, ColumnField, IntegerType, FloatType, StringType

# Define schema with specific types
schema = Schema([
    ColumnField("id", IntegerType),
    ColumnField("amount", FloatType),
    ColumnField("description", StringType)
])

# Read CSV with explicit schema
df = session.read.csv("data.csv", schema=schema)
Explicit schemas for CSV files only support primitive types: IntegerType, FloatType, DoubleType, BooleanType, and StringType.

Schema Merging

When reading multiple files with different schemas:
# Merge schemas across all CSV files
# Missing columns are filled with nulls
df = session.read.csv("data/*.csv", merge_schemas=True)

Working with Schemas

Get DataFrame Schema

df = session.create_dataframe({"id": [1, 2, 3]})

# Print schema in readable format
df.print_schema()

# Get schema object
schema = df.schema
column_names = schema.column_names()

Compare Schemas

schema1 = fc.Schema([
    fc.ColumnField("id", fc.IntegerType)
])

schema2 = fc.Schema([
    fc.ColumnField("id", fc.IntegerType)
])

are_equal = (schema1 == schema2)  # True

Access Column Types

schema = fc.Schema([
    fc.ColumnField("id", fc.IntegerType),
    fc.ColumnField("name", fc.StringType)
])

# Iterate through columns
for field in schema.column_fields:
    print(f"Column: {field.name}, Type: {field.data_type}")
# Output:
# Column: id, Type: IntegerType
# Column: name, Type: StringType

Best Practices

Use Explicit Schemas When

  • Reading CSV files with specific type requirements
  • Creating tables with precise type constraints
  • Ensuring type consistency across multiple data sources

Use Schema Inference When

  • Exploring new datasets
  • Working with Parquet files (schema is preserved)
  • Prototyping and development

Schema Evolution

# Add columns to existing data
df = session.table("my_table")

df_with_new_column = df.select(
    "*",
    fc.lit("default_value").alias("new_column")
)

# Save with overwrite to update schema
df_with_new_column.write.save_as_table("my_table", mode="overwrite")

See Also

Build docs developers (and LLMs) love