Skip to main content

Quickstart

This guide will walk you through creating your first Fenic application: an error log analyzer that extracts structured information from unstructured logs using semantic operations.

Prerequisites

1

Install Fenic

pip install fenic
2

Set your OpenAI API key

export OPENAI_API_KEY="sk-..."
This quickstart uses OpenAI’s GPT-4o-mini model. You can use other providers by following the Installation guide.

Your First Pipeline

Create a file called error_analyzer.py:
error_analyzer.py
from pydantic import BaseModel, Field
import fenic as fc

# Define the structure we want to extract
class ErrorAnalysis(BaseModel):
    root_cause: str = Field(description="The root cause of this error")
    fix_recommendation: str = Field(description="How to fix this error")

# 1. Create session with semantic capabilities
session = fc.Session.get_or_create(
    fc.SessionConfig(
        app_name="error_analyzer",
        semantic=fc.SemanticConfig(
            language_models={
                "mini": fc.OpenAILanguageModel(
                    model_name="gpt-4o-mini",
                    rpm=500,
                    tpm=200_000
                )
            }
        )
    )
)

# 2. Load sample error logs
error_logs = [
    {
        "timestamp": "2024-01-20 14:23:45",
        "service": "api-gateway",
        "error_log": """
ERROR: NullPointerException in UserService.getProfile()
    at com.app.UserService.getProfile(UserService.java:45)
User ID: 12345 was not found in cache, attempted DB lookup returned null
        """
    },
    {
        "timestamp": "2024-01-20 14:24:12",
        "service": "auth-service",
        "error_log": """
Error: connect ECONNREFUSED 127.0.0.1:6379
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1494:16)
Redis connection failed during session validation
        """
    }
]

df = session.create_dataframe(error_logs)

# 3. Analyze errors using semantic operations
df_analyzed = df.select(
    "timestamp",
    "service",
    fc.semantic.classify(
        "error_log",
        ["low", "medium", "high", "critical"]
    ).alias("severity"),
    fc.semantic.extract(
        "error_log",
        ErrorAnalysis
    ).alias("analysis")
)

# 4. Display results
result = df_analyzed.select(
    "timestamp",
    "service",
    "severity",
    df_analyzed.analysis.root_cause.alias("root_cause"),
    df_analyzed.analysis.fix_recommendation.alias("fix")
)

result.show()

# Clean up
session.stop()

Run Your Pipeline

python error_analyzer.py
You should see output like:
┌────────────────────┬──────────────┬──────────┬─────────────────────────────┬─────────────────────────────┐
│ timestamp          │ service      │ severity │ root_cause                  │ fix                         │
├────────────────────┼──────────────┼──────────┼─────────────────────────────┼─────────────────────────────┤
│ 2024-01-20 14:23:45│ api-gateway  │ high     │ User not found in cache/DB  │ Add null checks and logging │
│ 2024-01-20 14:24:12│ auth-service │ critical │ Redis connection refused    │ Check Redis service status  │
└────────────────────┴──────────────┴──────────┴─────────────────────────────┴─────────────────────────────┘

Understanding the Code

Let’s break down what’s happening:
1

Define Schema

Use Pydantic models to define the structure you want to extract:
class ErrorAnalysis(BaseModel):
    root_cause: str = Field(description="The root cause of this error")
    fix_recommendation: str = Field(description="How to fix this error")
Field descriptions guide the LLM on what to extract.
2

Create Session

The session manages configuration and execution:
session = fc.Session.get_or_create(
    fc.SessionConfig(
        app_name="error_analyzer",
        semantic=fc.SemanticConfig(
            language_models={"mini": fc.OpenAILanguageModel(...)}
        )
    )
)
3

Load Data

Create a DataFrame from your data:
df = session.create_dataframe(error_logs)
Fenic supports CSV, JSON, Parquet, Pandas/Polars DataFrames, and more.
4

Apply Semantic Operations

Use semantic operators to process unstructured data:
fc.semantic.classify("error_log", ["low", "medium", "high", "critical"])
fc.semantic.extract("error_log", ErrorAnalysis)
These operations happen outside your agent’s context window.
5

Access Nested Fields

Extract fields from structured data:
df_analyzed.analysis.root_cause.alias("root_cause")
Use dot notation to access Pydantic model fields.

Key Semantic Operations

Fenic provides several semantic operators:

Extract

Extract structured data from text using Pydantic schemas:
fc.semantic.extract(
    fc.col("text"),
    MyPydanticSchema
).alias("structured_data")

Classify

Classify text into predefined categories:
fc.semantic.classify(
    fc.col("text"),
    ["positive", "negative", "neutral"]
).alias("sentiment")

Embed

Generate embeddings for semantic search:
fc.semantic.embed(
    fc.col("text")
).alias("embedding")

Map

Transform text using LLM templates:
fc.semantic.map(
    "Summarize this in one sentence: {{ text }}",
    text=fc.col("content")
).alias("summary")

Combining with Deterministic Operations

Fenic shines when you combine semantic operations with traditional DataFrame operations:
result = (
    df.select(
        fc.col("timestamp"),
        fc.semantic.extract(fc.col("log"), ErrorSchema).alias("error")
    )
    .filter(fc.col("error").severity == "critical")  # Deterministic filter
    .sort(fc.col("timestamp").desc())                # Deterministic sort
    .limit(10)                                        # Deterministic limit
)
Semantic operations (extract, classify, embed) use LLMs and cost tokens. Deterministic operations (filter, sort, limit) are free and fast.

Working with PDFs

Fenic can parse PDFs directly:
from pydantic import BaseModel, Field

class QAPair(BaseModel):
    question: str = Field(description="A question from the document")
    answer: str = Field(description="The answer to the question")

# Parse PDF and extract Q&A pairs
qa_df = (
    session.read.pdf_metadata("policies/*.pdf")
    .select(
        fc.col("file_path").alias("source"),
        fc.semantic.parse_pdf(fc.col("file_path")).alias("content")
    )
    .select(
        fc.col("source"),
        fc.semantic.extract(
            fc.col("content").cast(fc.StringType),
            QAPair
        ).alias("qa")
    )
    .unnest("qa")  # Flatten the nested structure
)

qa_df.show()
Create a semantic search pipeline:
# 1. Embed your documents
docs = session.create_dataframe([
    {"id": 1, "text": "Python is a programming language"},
    {"id": 2, "text": "Machine learning uses algorithms"},
    {"id": 3, "text": "Databases store structured data"}
])

docs_with_embeddings = docs.select(
    "id",
    "text",
    fc.semantic.embed(fc.col("text")).alias("embedding")
)

# 2. Save as table
docs_with_embeddings.write.save_as_table("documents", mode="overwrite")

# 3. Query with semantic similarity
query = session.create_dataframe([{"query": "What is coding?"}])

results = query.semantic.sim_join(
    session.table("documents"),
    left_on=fc.semantic.embed(fc.col("query")),
    right_on=fc.col("embedding"),
    k=2,  # Top 2 results
    similarity_score_column="score"
).select("text", "score")

results.show()

Saving Results

Fenic supports multiple output formats:
# Save to Fenic's native storage
df.write.save_as_table("my_table", mode="overwrite")

# Load later
loaded = session.table("my_table")

Working with Multiple Models

You can configure multiple models and choose which to use:
session = fc.Session.get_or_create(
    fc.SessionConfig(
        app_name="multi_model",
        semantic=fc.SemanticConfig(
            language_models={
                "fast": fc.OpenAILanguageModel(
                    model_name="gpt-4o-mini",
                    rpm=500,
                    tpm=200_000
                ),
                "powerful": fc.OpenAILanguageModel(
                    model_name="gpt-4o",
                    rpm=100,
                    tpm=100_000
                )
            },
            default_language_model="fast"  # Use fast by default
        )
    )
)

# Use default model (fast)
result1 = df.semantic.classify("text", ["a", "b", "c"])

# Use specific model (powerful)
result2 = df.semantic.extract(
    "text",
    ComplexSchema,
    model="powerful"  # Override default
)

Token Tracking

Fenic tracks token usage and costs:
metrics = df_analyzed.write.save_as_table("results", mode="overwrite")

print(f"Total cost: ${metrics.total_lm_metrics.cost:.4f}")
print(f"Input tokens: {metrics.total_lm_metrics.num_uncached_input_tokens}")
print(f"Output tokens: {metrics.total_lm_metrics.num_output_tokens}")

Next Steps

Core Concepts

Learn about Sessions, DataFrames, and Semantic Operations

API Reference

Explore all available functions and operators

Examples

Browse real-world examples and use cases

MCP Integration

Expose Fenic pipelines as MCP tools for agents

Common Patterns

Pattern 1: Extract → Filter → Aggregate

results = (
    df.semantic.extract("text", MySchema)
    .filter(fc.col("extracted").field == "value")
    .group_by("category")
    .agg(fc.count("*").alias("count"))
)
# Index phase
indexed = (
    session.read.pdf_metadata("docs/*.pdf")
    .semantic.parse_pdf()
    .semantic.embed()
)
indexed.write.save_as_table("index")

# Search phase
results = query.semantic.sim_join(session.table("index"), k=5)

Pattern 3: Classify → Route → Process

routed = df.semantic.classify("text", ["urgent", "normal", "low"])

urgent = routed.filter(fc.col("class") == "urgent")
normal = routed.filter(fc.col("class") == "normal")
low = routed.filter(fc.col("class") == "low")
All semantic operations are lazy—they only execute when you call .show(), .collect(), .write.save_as_table(), or other action methods.

Build docs developers (and LLMs) love