Fenic provides a comprehensive type system for working with structured and unstructured data. All data types inherit from the base DataType class.
Primitive Types
Primitive types represent basic scalar values and are the building blocks for more complex types.
StringType
Represents a UTF-8 encoded string value.
import fenic as fc
from fenic import StringType
# Use in schema
schema = fc.Schema([
fc.ColumnField( "name" , StringType)
])
# Cast column to string
df = df.select(fc.col( "value" ).cast(StringType))
IntegerType
Represents a signed integer value.
from fenic import IntegerType
schema = fc.Schema([
fc.ColumnField( "id" , IntegerType)
])
FloatType
Represents a 32-bit floating-point number.
from fenic import FloatType
schema = fc.Schema([
fc.ColumnField( "score" , FloatType)
])
DoubleType
Represents a 64-bit floating-point number.
from fenic import DoubleType
schema = fc.Schema([
fc.ColumnField( "precise_value" , DoubleType)
])
BooleanType
Represents a boolean value (True/False).
from fenic import BooleanType
schema = fc.Schema([
fc.ColumnField( "is_active" , BooleanType)
])
DateType
Represents a date value.
from fenic import DateType
schema = fc.Schema([
fc.ColumnField( "birth_date" , DateType)
])
TimestampType
Represents a timestamp value.
from fenic import TimestampType
schema = fc.Schema([
fc.ColumnField( "created_at" , TimestampType)
])
Composite Types
Composite types represent structured collections of values.
ArrayType
Represents a homogeneous variable-length array (list) of elements.
The data type of each element in the array.
Examples
Array of strings
Array of integers
Nested arrays
from fenic import ArrayType, StringType
schema = fc.Schema([
fc.ColumnField( "tags" , ArrayType(StringType))
])
StructType
Represents a struct (record) with named fields. Each field can have a different data type.
struct_fields
List[StructField]
required
List of field definitions. Each field specifies a name and data type.
StructField
Defines a single field in a struct.
The data type of the field.
Examples
Basic struct
Nested struct
Using with unnest
from fenic import StructType, StructField, StringType, IntegerType
schema = fc.Schema([
fc.ColumnField( "person" , StructType([
StructField( "name" , StringType),
StructField( "age" , IntegerType),
]))
])
Specialized Types
Specialized types provide semantic meaning and enable type-specific operations.
MarkdownType
Represents a string containing Markdown-formatted text. Enables markdown-specific functions.
import fenic as fc
from fenic import MarkdownType
# Cast to MarkdownType to use markdown functions
df = df.select(fc.col( "content" ).cast(MarkdownType).alias( "markdown" ))
# Use markdown-specific functions
toc_df = df.select(
fc.markdown.generate_toc(fc.col( "markdown" )).alias( "toc" )
)
sections_df = df.select(
fc.markdown.extract_header_chunks(
fc.col( "markdown" ),
header_level = 2
).alias( "sections" )
).explode( "sections" )
JsonType
Represents a string containing JSON data. Enables JSON-specific functions.
import fenic as fc
from fenic import JsonType
# Cast to JsonType to use JSON functions
df = df.select(fc.col( "json_string" ).cast(JsonType).alias( "json_data" ))
# Use jq to extract data
df = df.select(
fc.json.jq(
fc.col( "json_data" ),
'.users[] | {name: .name, email: .email} '
).alias( "user" )
).explode( "user" )
HtmlType
Represents a string containing raw HTML markup.
from fenic import HtmlType
df = df.select(fc.col( "content" ).cast(HtmlType).alias( "html" ))
TranscriptType
Represents a string containing a transcript in a specific format.
format
'generic' | 'srt' | 'webvtt'
required
The transcript format.
from fenic import TranscriptType
# Generic transcript format
df = df.select(
fc.col( "content" ).cast(TranscriptType( format = "generic" )).alias( "transcript" )
)
# SRT subtitle format
df = df.select(
fc.col( "content" ).cast(TranscriptType( format = "srt" )).alias( "transcript" )
)
# WebVTT format
df = df.select(
fc.col( "content" ).cast(TranscriptType( format = "webvtt" )).alias( "transcript" )
)
DocumentPathType
Represents a string containing a document’s local (file system) or remote (URL) path.
The document format. Currently only supports PDF.
from fenic import DocumentPathType
df = df.select(
fc.col( "file_path" ).cast(DocumentPathType( format = "pdf" )).alias( "pdf_path" )
)
EmbeddingType
Represents a fixed-length embedding vector generated by a specific model.
The number of dimensions in the embedding vector.
Name of the model used to generate the embedding.
Examples
OpenAI embeddings
Custom dimensions
Using with semantic operations
from fenic import EmbeddingType
# text-embedding-3-small (1536 dimensions)
embedding_type = EmbeddingType(
dimensions = 1536 ,
embedding_model = "text-embedding-3-small"
)
Type Casting
Use the cast() method to convert columns between types:
import fenic as fc
# Cast to string
df = df.select(fc.col( "id" ).cast(fc.StringType))
# Cast to integer
df = df.select(fc.col( "amount" ).cast(fc.IntegerType))
# Cast to specialized types
df = df.select(
fc.col( "content" ).cast(fc.MarkdownType),
fc.col( "data" ).cast(fc.JsonType),
)
Type Validation
Fenic validates types at plan construction time. Type mismatches are caught early:
# This will raise a validation error
df = df.select(
fc.col( "string_col" ) + fc.col( "array_col" ) # Can't add string and array
)
Working with Complex Types
Accessing Struct Fields
Use get_item() or unnest() to access struct fields:
Using get_item()
Using unnest()
df = df.select(
fc.col( "person" ).get_item( "name" ).alias( "name" ),
fc.col( "person" ).get_item( "age" ).alias( "age" )
)
Accessing Array Elements
Use get_item() with an index or explode() to expand arrays:
Get first element
Explode array into rows
df = df.select(
fc.col( "tags" ).get_item( 0 ).alias( "first_tag" )
)
See Also