Schema

Schemas define the structure of DataFrames by specifying column names and their data types.

Represents the complete schema of a DataFrame.

column_fields

List[ColumnField]

required

An ordered list of ColumnField objects that define the structure of the DataFrame.

Methods

column_names()

Get a list of all column names in the schema. Returns: List[str] - List of column names

schema = fc.Schema([
    fc.ColumnField("id", fc.IntegerType),
    fc.ColumnField("name", fc.StringType)
])

columns = schema.column_names()
# Returns: ['id', 'name']

Examples

import fenic as fc
from fenic import Schema, ColumnField, IntegerType, StringType

schema = Schema([
    ColumnField("id", IntegerType),
    ColumnField("name", StringType)
])

ColumnField

Represents a typed column in a DataFrame schema.

name

str

required

The name of the column.

data_type

DataType

required

The data type of the column.

Examples

from fenic import ColumnField, StringType

field = ColumnField("name", StringType)

DatasetMetadata

Contains metadata about a dataset (table or view), including its schema and description.

schema

Schema

required

The schema of the dataset.

description

str | None

required

The natural language description of the dataset’s contents.

Example

import fenic as fc

session = fc.Session.get_or_create()

# Create a table
schema = fc.Schema([
    fc.ColumnField("id", fc.IntegerType),
    fc.ColumnField("name", fc.StringType)
])
session.catalog.create_table(
    "my_table",
    schema,
    description="A table containing user information"
)

# Get metadata
metadata = session.catalog.describe_table("my_table")
print(metadata.schema)        # Schema with id and name columns
print(metadata.description)   # "A table containing user information"

Schema Inference

Fenic automatically infers schemas when reading data:

CSV Files

# Schema is automatically inferred from CSV headers and data
df = session.read.csv("data.csv")

# View inferred schema
df.print_schema()

Parquet Files

# Schema is read from Parquet metadata
df = session.read.parquet("data.parquet")

df.print_schema()

Python Data

# Schema is inferred from Python types
df = session.create_dataframe({
    "id": [1, 2, 3],
    "name": ["Alice", "Bob", "Charlie"],
    "scores": [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
})

df.print_schema()
# Schema(
#     ColumnField(name='id', data_type=IntegerType),
#     ColumnField(name='name', data_type=StringType),
#     ColumnField(name='scores', data_type=ArrayType(element_type=IntegerType))
# )

Explicit Schema Specification

You can provide explicit schemas when reading CSV files:

import fenic as fc
from fenic import Schema, ColumnField, IntegerType, FloatType, StringType

# Define schema with specific types
schema = Schema([
    ColumnField("id", IntegerType),
    ColumnField("amount", FloatType),
    ColumnField("description", StringType)
])

# Read CSV with explicit schema
df = session.read.csv("data.csv", schema=schema)

Explicit schemas for CSV files only support primitive types: IntegerType, FloatType, DoubleType, BooleanType, and StringType.

Schema Merging

When reading multiple files with different schemas:

# Merge schemas across all CSV files
# Missing columns are filled with nulls
df = session.read.csv("data/*.csv", merge_schemas=True)

Working with Schemas

Get DataFrame Schema

df = session.create_dataframe({"id": [1, 2, 3]})

# Print schema in readable format
df.print_schema()

# Get schema object
schema = df.schema
column_names = schema.column_names()

Compare Schemas

schema1 = fc.Schema([
    fc.ColumnField("id", fc.IntegerType)
])

schema2 = fc.Schema([
    fc.ColumnField("id", fc.IntegerType)
])

are_equal = (schema1 == schema2)  # True

Access Column Types

schema = fc.Schema([
    fc.ColumnField("id", fc.IntegerType),
    fc.ColumnField("name", fc.StringType)
])

# Iterate through columns
for field in schema.column_fields:
    print(f"Column: {field.name}, Type: {field.data_type}")
# Output:
# Column: id, Type: IntegerType
# Column: name, Type: StringType

Best Practices

Use Explicit Schemas When

Reading CSV files with specific type requirements
Creating tables with precise type constraints
Ensuring type consistency across multiple data sources

Use Schema Inference When

Exploring new datasets
Working with Parquet files (schema is preserved)
Prototyping and development

Schema Evolution

# Add columns to existing data
df = session.table("my_table")

df_with_new_column = df.select(
    "*",
    fc.lit("default_value").alias("new_column")
)

# Save with overwrite to update schema
df_with_new_column.write.save_as_table("my_table", mode="overwrite")

Core

Functions

I/O

Types

Configuration

MCP

Schema

Schema

Methods

column_names()

Examples

ColumnField

Examples

DatasetMetadata

Example

Schema Inference

CSV Files

Parquet Files

Python Data

Explicit Schema Specification

Schema Merging

Working with Schemas

Get DataFrame Schema

Compare Schemas

Access Column Types

Best Practices

Use Explicit Schemas When

Use Schema Inference When

Schema Evolution

See Also

Build docs developers (and LLMs) love

Core

Functions

I/O

Types

Configuration

MCP

​Schema

​Methods

​column_names()

​Examples

​ColumnField

​Examples

​DatasetMetadata

​Example

​Schema Inference

​CSV Files

​Parquet Files

​Python Data

​Explicit Schema Specification

​Schema Merging

​Working with Schemas

​Get DataFrame Schema

​Compare Schemas

​Access Column Types

​Best Practices

​Use Explicit Schemas When

​Use Schema Inference When

​Schema Evolution

​See Also

Build docs developers (and LLMs) love

Schema

Methods

column_names()

Examples

ColumnField

Examples

DatasetMetadata

Example

Schema Inference

CSV Files

Parquet Files

Python Data

Explicit Schema Specification

Schema Merging

Working with Schemas

Get DataFrame Schema

Compare Schemas

Access Column Types

Best Practices

Use Explicit Schemas When

Use Schema Inference When

Schema Evolution

See Also