Quickstart
This guide will walk you through creating your first Fenic application: an error log analyzer that extracts structured information from unstructured logs using semantic operations.
Prerequisites
Set your OpenAI API key
export OPENAI_API_KEY = "sk-..."
This quickstart uses OpenAI’s GPT-4o-mini model. You can use other providers by following the Installation guide .
Your First Pipeline
Create a file called error_analyzer.py:
from pydantic import BaseModel, Field
import fenic as fc
# Define the structure we want to extract
class ErrorAnalysis ( BaseModel ):
root_cause: str = Field( description = "The root cause of this error" )
fix_recommendation: str = Field( description = "How to fix this error" )
# 1. Create session with semantic capabilities
session = fc.Session.get_or_create(
fc.SessionConfig(
app_name = "error_analyzer" ,
semantic = fc.SemanticConfig(
language_models = {
"mini" : fc.OpenAILanguageModel(
model_name = "gpt-4o-mini" ,
rpm = 500 ,
tpm = 200_000
)
}
)
)
)
# 2. Load sample error logs
error_logs = [
{
"timestamp" : "2024-01-20 14:23:45" ,
"service" : "api-gateway" ,
"error_log" : """
ERROR: NullPointerException in UserService.getProfile()
at com.app.UserService.getProfile(UserService.java:45)
User ID: 12345 was not found in cache, attempted DB lookup returned null
"""
},
{
"timestamp" : "2024-01-20 14:24:12" ,
"service" : "auth-service" ,
"error_log" : """
Error: connect ECONNREFUSED 127.0.0.1:6379
at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1494:16)
Redis connection failed during session validation
"""
}
]
df = session.create_dataframe(error_logs)
# 3. Analyze errors using semantic operations
df_analyzed = df.select(
"timestamp" ,
"service" ,
fc.semantic.classify(
"error_log" ,
[ "low" , "medium" , "high" , "critical" ]
).alias( "severity" ),
fc.semantic.extract(
"error_log" ,
ErrorAnalysis
).alias( "analysis" )
)
# 4. Display results
result = df_analyzed.select(
"timestamp" ,
"service" ,
"severity" ,
df_analyzed.analysis.root_cause.alias( "root_cause" ),
df_analyzed.analysis.fix_recommendation.alias( "fix" )
)
result.show()
# Clean up
session.stop()
Run Your Pipeline
You should see output like:
┌────────────────────┬──────────────┬──────────┬─────────────────────────────┬─────────────────────────────┐
│ timestamp │ service │ severity │ root_cause │ fix │
├────────────────────┼──────────────┼──────────┼─────────────────────────────┼─────────────────────────────┤
│ 2024-01-20 14:23:45│ api-gateway │ high │ User not found in cache/DB │ Add null checks and logging │
│ 2024-01-20 14:24:12│ auth-service │ critical │ Redis connection refused │ Check Redis service status │
└────────────────────┴──────────────┴──────────┴─────────────────────────────┴─────────────────────────────┘
Understanding the Code
Let’s break down what’s happening:
Define Schema
Use Pydantic models to define the structure you want to extract: class ErrorAnalysis ( BaseModel ):
root_cause: str = Field( description = "The root cause of this error" )
fix_recommendation: str = Field( description = "How to fix this error" )
Field descriptions guide the LLM on what to extract.
Create Session
The session manages configuration and execution: session = fc.Session.get_or_create(
fc.SessionConfig(
app_name = "error_analyzer" ,
semantic = fc.SemanticConfig(
language_models = { "mini" : fc.OpenAILanguageModel( ... )}
)
)
)
Load Data
Create a DataFrame from your data: df = session.create_dataframe(error_logs)
Fenic supports CSV, JSON, Parquet, Pandas/Polars DataFrames, and more.
Apply Semantic Operations
Use semantic operators to process unstructured data: fc.semantic.classify( "error_log" , [ "low" , "medium" , "high" , "critical" ])
fc.semantic.extract( "error_log" , ErrorAnalysis)
These operations happen outside your agent’s context window .
Access Nested Fields
Extract fields from structured data: df_analyzed.analysis.root_cause.alias( "root_cause" )
Use dot notation to access Pydantic model fields.
Key Semantic Operations
Fenic provides several semantic operators:
Extract structured data from text using Pydantic schemas:
fc.semantic.extract(
fc.col( "text" ),
MyPydanticSchema
).alias( "structured_data" )
Classify
Classify text into predefined categories:
fc.semantic.classify(
fc.col( "text" ),
[ "positive" , "negative" , "neutral" ]
).alias( "sentiment" )
Embed
Generate embeddings for semantic search:
fc.semantic.embed(
fc.col( "text" )
).alias( "embedding" )
Map
Transform text using LLM templates:
fc.semantic.map(
"Summarize this in one sentence: {{ text }} " ,
text = fc.col( "content" )
).alias( "summary" )
Combining with Deterministic Operations
Fenic shines when you combine semantic operations with traditional DataFrame operations:
result = (
df.select(
fc.col( "timestamp" ),
fc.semantic.extract(fc.col( "log" ), ErrorSchema).alias( "error" )
)
.filter(fc.col( "error" ).severity == "critical" ) # Deterministic filter
.sort(fc.col( "timestamp" ).desc()) # Deterministic sort
.limit( 10 ) # Deterministic limit
)
Semantic operations (extract, classify, embed) use LLMs and cost tokens. Deterministic operations (filter, sort, limit) are free and fast.
Working with PDFs
Fenic can parse PDFs directly:
from pydantic import BaseModel, Field
class QAPair ( BaseModel ):
question: str = Field( description = "A question from the document" )
answer: str = Field( description = "The answer to the question" )
# Parse PDF and extract Q&A pairs
qa_df = (
session.read.pdf_metadata( "policies/*.pdf" )
.select(
fc.col( "file_path" ).alias( "source" ),
fc.semantic.parse_pdf(fc.col( "file_path" )).alias( "content" )
)
.select(
fc.col( "source" ),
fc.semantic.extract(
fc.col( "content" ).cast(fc.StringType),
QAPair
).alias( "qa" )
)
.unnest( "qa" ) # Flatten the nested structure
)
qa_df.show()
Semantic Search
Create a semantic search pipeline:
# 1. Embed your documents
docs = session.create_dataframe([
{ "id" : 1 , "text" : "Python is a programming language" },
{ "id" : 2 , "text" : "Machine learning uses algorithms" },
{ "id" : 3 , "text" : "Databases store structured data" }
])
docs_with_embeddings = docs.select(
"id" ,
"text" ,
fc.semantic.embed(fc.col( "text" )).alias( "embedding" )
)
# 2. Save as table
docs_with_embeddings.write.save_as_table( "documents" , mode = "overwrite" )
# 3. Query with semantic similarity
query = session.create_dataframe([{ "query" : "What is coding?" }])
results = query.semantic.sim_join(
session.table( "documents" ),
left_on = fc.semantic.embed(fc.col( "query" )),
right_on = fc.col( "embedding" ),
k = 2 , # Top 2 results
similarity_score_column = "score"
).select( "text" , "score" )
results.show()
Saving Results
Fenic supports multiple output formats:
Save as Table
Export to Parquet
Export to CSV
Collect to Python
# Save to Fenic's native storage
df.write.save_as_table( "my_table" , mode = "overwrite" )
# Load later
loaded = session.table( "my_table" )
Working with Multiple Models
You can configure multiple models and choose which to use:
session = fc.Session.get_or_create(
fc.SessionConfig(
app_name = "multi_model" ,
semantic = fc.SemanticConfig(
language_models = {
"fast" : fc.OpenAILanguageModel(
model_name = "gpt-4o-mini" ,
rpm = 500 ,
tpm = 200_000
),
"powerful" : fc.OpenAILanguageModel(
model_name = "gpt-4o" ,
rpm = 100 ,
tpm = 100_000
)
},
default_language_model = "fast" # Use fast by default
)
)
)
# Use default model (fast)
result1 = df.semantic.classify( "text" , [ "a" , "b" , "c" ])
# Use specific model (powerful)
result2 = df.semantic.extract(
"text" ,
ComplexSchema,
model = "powerful" # Override default
)
Token Tracking
Fenic tracks token usage and costs:
metrics = df_analyzed.write.save_as_table( "results" , mode = "overwrite" )
print ( f "Total cost: $ { metrics.total_lm_metrics.cost :.4f} " )
print ( f "Input tokens: { metrics.total_lm_metrics.num_uncached_input_tokens } " )
print ( f "Output tokens: { metrics.total_lm_metrics.num_output_tokens } " )
Next Steps
Core Concepts Learn about Sessions, DataFrames, and Semantic Operations
API Reference Explore all available functions and operators
Examples Browse real-world examples and use cases
MCP Integration Expose Fenic pipelines as MCP tools for agents
Common Patterns
results = (
df.semantic.extract( "text" , MySchema)
.filter(fc.col( "extracted" ).field == "value" )
.group_by( "category" )
.agg(fc.count( "*" ).alias( "count" ))
)
Pattern 2: Parse → Embed → Search
# Index phase
indexed = (
session.read.pdf_metadata( "docs/*.pdf" )
.semantic.parse_pdf()
.semantic.embed()
)
indexed.write.save_as_table( "index" )
# Search phase
results = query.semantic.sim_join(session.table( "index" ), k = 5 )
Pattern 3: Classify → Route → Process
routed = df.semantic.classify( "text" , [ "urgent" , "normal" , "low" ])
urgent = routed.filter(fc.col( "class" ) == "urgent" )
normal = routed.filter(fc.col( "class" ) == "normal" )
low = routed.filter(fc.col( "class" ) == "low" )
All semantic operations are lazy—they only execute when you call .show(), .collect(), .write.save_as_table(), or other action methods.