Schemas define the structure of DataFrames by specifying column names and their data types.
Schema
Represents the complete schema of a DataFrame.
column_fields
List[ColumnField]
required
An ordered list of ColumnField objects that define the structure of the DataFrame.
Methods
column_names()
Get a list of all column names in the schema.
Returns: List[str] - List of column names
schema = fc.Schema([
fc.ColumnField( "id" , fc.IntegerType),
fc.ColumnField( "name" , fc.StringType)
])
columns = schema.column_names()
# Returns: ['id', 'name']
Examples
Basic schema
Schema with complex types
Creating a table with schema
import fenic as fc
from fenic import Schema, ColumnField, IntegerType, StringType
schema = Schema([
ColumnField( "id" , IntegerType),
ColumnField( "name" , StringType)
])
ColumnField
Represents a typed column in a DataFrame schema.
The data type of the column.
Examples
String column
Integer column
Array column
Struct column
from fenic import ColumnField, StringType
field = ColumnField( "name" , StringType)
Contains metadata about a dataset (table or view), including its schema and description.
The schema of the dataset.
The natural language description of the dataset’s contents.
Example
import fenic as fc
session = fc.Session.get_or_create()
# Create a table
schema = fc.Schema([
fc.ColumnField( "id" , fc.IntegerType),
fc.ColumnField( "name" , fc.StringType)
])
session.catalog.create_table(
"my_table" ,
schema,
description = "A table containing user information"
)
# Get metadata
metadata = session.catalog.describe_table( "my_table" )
print (metadata.schema) # Schema with id and name columns
print (metadata.description) # "A table containing user information"
Schema Inference
Fenic automatically infers schemas when reading data:
CSV Files
# Schema is automatically inferred from CSV headers and data
df = session.read.csv( "data.csv" )
# View inferred schema
df.print_schema()
Parquet Files
# Schema is read from Parquet metadata
df = session.read.parquet( "data.parquet" )
df.print_schema()
Python Data
# Schema is inferred from Python types
df = session.create_dataframe({
"id" : [ 1 , 2 , 3 ],
"name" : [ "Alice" , "Bob" , "Charlie" ],
"scores" : [[ 1 , 2 , 3 ], [ 4 , 5 , 6 ], [ 7 , 8 , 9 ]]
})
df.print_schema()
# Schema(
# ColumnField(name='id', data_type=IntegerType),
# ColumnField(name='name', data_type=StringType),
# ColumnField(name='scores', data_type=ArrayType(element_type=IntegerType))
# )
Explicit Schema Specification
You can provide explicit schemas when reading CSV files:
import fenic as fc
from fenic import Schema, ColumnField, IntegerType, FloatType, StringType
# Define schema with specific types
schema = Schema([
ColumnField( "id" , IntegerType),
ColumnField( "amount" , FloatType),
ColumnField( "description" , StringType)
])
# Read CSV with explicit schema
df = session.read.csv( "data.csv" , schema = schema)
Explicit schemas for CSV files only support primitive types: IntegerType, FloatType, DoubleType, BooleanType, and StringType.
Schema Merging
When reading multiple files with different schemas:
# Merge schemas across all CSV files
# Missing columns are filled with nulls
df = session.read.csv( "data/*.csv" , merge_schemas = True )
Working with Schemas
Get DataFrame Schema
df = session.create_dataframe({ "id" : [ 1 , 2 , 3 ]})
# Print schema in readable format
df.print_schema()
# Get schema object
schema = df.schema
column_names = schema.column_names()
Compare Schemas
schema1 = fc.Schema([
fc.ColumnField( "id" , fc.IntegerType)
])
schema2 = fc.Schema([
fc.ColumnField( "id" , fc.IntegerType)
])
are_equal = (schema1 == schema2) # True
Access Column Types
schema = fc.Schema([
fc.ColumnField( "id" , fc.IntegerType),
fc.ColumnField( "name" , fc.StringType)
])
# Iterate through columns
for field in schema.column_fields:
print ( f "Column: { field.name } , Type: { field.data_type } " )
# Output:
# Column: id, Type: IntegerType
# Column: name, Type: StringType
Best Practices
Use Explicit Schemas When
Reading CSV files with specific type requirements
Creating tables with precise type constraints
Ensuring type consistency across multiple data sources
Use Schema Inference When
Exploring new datasets
Working with Parquet files (schema is preserved)
Prototyping and development
Schema Evolution
# Add columns to existing data
df = session.table( "my_table" )
df_with_new_column = df.select(
"*" ,
fc.lit( "default_value" ).alias( "new_column" )
)
# Save with overwrite to update schema
df_with_new_column.write.save_as_table( "my_table" , mode = "overwrite" )
See Also