Quickstart

This guide will walk you through creating your first Fenic application: an error log analyzer that extracts structured information from unstructured logs using semantic operations.

Prerequisites

Install Fenic

pip install fenic

Set your OpenAI API key

export OPENAI_API_KEY="sk-..."

This quickstart uses OpenAI’s GPT-4o-mini model. You can use other providers by following the Installation guide.

Your First Pipeline

Create a file called error_analyzer.py:

error_analyzer.py

from pydantic import BaseModel, Field
import fenic as fc

# Define the structure we want to extract
class ErrorAnalysis(BaseModel):
    root_cause: str = Field(description="The root cause of this error")
    fix_recommendation: str = Field(description="How to fix this error")

# 1. Create session with semantic capabilities
session = fc.Session.get_or_create(
    fc.SessionConfig(
        app_name="error_analyzer",
        semantic=fc.SemanticConfig(
            language_models={
                "mini": fc.OpenAILanguageModel(
                    model_name="gpt-4o-mini",
                    rpm=500,
                    tpm=200_000
                )
            }
        )
    )
)

# 2. Load sample error logs
error_logs = [
    {
        "timestamp": "2024-01-20 14:23:45",
        "service": "api-gateway",
        "error_log": """
ERROR: NullPointerException in UserService.getProfile()
    at com.app.UserService.getProfile(UserService.java:45)
User ID: 12345 was not found in cache, attempted DB lookup returned null
        """
    },
    {
        "timestamp": "2024-01-20 14:24:12",
        "service": "auth-service",
        "error_log": """
Error: connect ECONNREFUSED 127.0.0.1:6379
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1494:16)
Redis connection failed during session validation
        """
    }
]

df = session.create_dataframe(error_logs)

# 3. Analyze errors using semantic operations
df_analyzed = df.select(
    "timestamp",
    "service",
    fc.semantic.classify(
        "error_log",
        ["low", "medium", "high", "critical"]
    ).alias("severity"),
    fc.semantic.extract(
        "error_log",
        ErrorAnalysis
    ).alias("analysis")
)

# 4. Display results
result = df_analyzed.select(
    "timestamp",
    "service",
    "severity",
    df_analyzed.analysis.root_cause.alias("root_cause"),
    df_analyzed.analysis.fix_recommendation.alias("fix")
)

result.show()

# Clean up
session.stop()

Run Your Pipeline

python error_analyzer.py

You should see output like:

┌────────────────────┬──────────────┬──────────┬─────────────────────────────┬─────────────────────────────┐
│ timestamp          │ service      │ severity │ root_cause                  │ fix                         │
├────────────────────┼──────────────┼──────────┼─────────────────────────────┼─────────────────────────────┤
│ 2024-01-20 14:23:45│ api-gateway  │ high     │ User not found in cache/DB  │ Add null checks and logging │
│ 2024-01-20 14:24:12│ auth-service │ critical │ Redis connection refused    │ Check Redis service status  │
└────────────────────┴──────────────┴──────────┴─────────────────────────────┴─────────────────────────────┘

Understanding the Code

Let’s break down what’s happening:

Define Schema

Use Pydantic models to define the structure you want to extract:

class ErrorAnalysis(BaseModel):
    root_cause: str = Field(description="The root cause of this error")
    fix_recommendation: str = Field(description="How to fix this error")

Field descriptions guide the LLM on what to extract.

Create Session

The session manages configuration and execution:

session = fc.Session.get_or_create(
    fc.SessionConfig(
        app_name="error_analyzer",
        semantic=fc.SemanticConfig(
            language_models={"mini": fc.OpenAILanguageModel(...)}
        )
    )
)

Load Data

Create a DataFrame from your data:

df = session.create_dataframe(error_logs)

Fenic supports CSV, JSON, Parquet, Pandas/Polars DataFrames, and more.

Apply Semantic Operations

Use semantic operators to process unstructured data:

fc.semantic.classify("error_log", ["low", "medium", "high", "critical"])
fc.semantic.extract("error_log", ErrorAnalysis)

These operations happen outside your agent’s context window.

Access Nested Fields

Extract fields from structured data:

df_analyzed.analysis.root_cause.alias("root_cause")

Use dot notation to access Pydantic model fields.

Key Semantic Operations

Fenic provides several semantic operators:

Extract

Extract structured data from text using Pydantic schemas:

fc.semantic.extract(
    fc.col("text"),
    MyPydanticSchema
).alias("structured_data")

Classify

Classify text into predefined categories:

fc.semantic.classify(
    fc.col("text"),
    ["positive", "negative", "neutral"]
).alias("sentiment")

Embed

Generate embeddings for semantic search:

fc.semantic.embed(
    fc.col("text")
).alias("embedding")

Map

Transform text using LLM templates:

fc.semantic.map(
    "Summarize this in one sentence: {{ text }}",
    text=fc.col("content")
).alias("summary")

Combining with Deterministic Operations

Fenic shines when you combine semantic operations with traditional DataFrame operations:

result = (
    df.select(
        fc.col("timestamp"),
        fc.semantic.extract(fc.col("log"), ErrorSchema).alias("error")
    )
    .filter(fc.col("error").severity == "critical")  # Deterministic filter
    .sort(fc.col("timestamp").desc())                # Deterministic sort
    .limit(10)                                        # Deterministic limit
)

Semantic operations (extract, classify, embed) use LLMs and cost tokens. Deterministic operations (filter, sort, limit) are free and fast.

Working with PDFs

Fenic can parse PDFs directly:

from pydantic import BaseModel, Field

class QAPair(BaseModel):
    question: str = Field(description="A question from the document")
    answer: str = Field(description="The answer to the question")

# Parse PDF and extract Q&A pairs
qa_df = (
    session.read.pdf_metadata("policies/*.pdf")
    .select(
        fc.col("file_path").alias("source"),
        fc.semantic.parse_pdf(fc.col("file_path")).alias("content")
    )
    .select(
        fc.col("source"),
        fc.semantic.extract(
            fc.col("content").cast(fc.StringType),
            QAPair
        ).alias("qa")
    )
    .unnest("qa")  # Flatten the nested structure
)

qa_df.show()

Semantic Search

Create a semantic search pipeline:

# 1. Embed your documents
docs = session.create_dataframe([
    {"id": 1, "text": "Python is a programming language"},
    {"id": 2, "text": "Machine learning uses algorithms"},
    {"id": 3, "text": "Databases store structured data"}
])

docs_with_embeddings = docs.select(
    "id",
    "text",
    fc.semantic.embed(fc.col("text")).alias("embedding")
)

# 2. Save as table
docs_with_embeddings.write.save_as_table("documents", mode="overwrite")

# 3. Query with semantic similarity
query = session.create_dataframe([{"query": "What is coding?"}])

results = query.semantic.sim_join(
    session.table("documents"),
    left_on=fc.semantic.embed(fc.col("query")),
    right_on=fc.col("embedding"),
    k=2,  # Top 2 results
    similarity_score_column="score"
).select("text", "score")

results.show()

Saving Results

Fenic supports multiple output formats:

# Save to Fenic's native storage
df.write.save_as_table("my_table", mode="overwrite")

# Load later
loaded = session.table("my_table")

Working with Multiple Models

You can configure multiple models and choose which to use:

session = fc.Session.get_or_create(
    fc.SessionConfig(
        app_name="multi_model",
        semantic=fc.SemanticConfig(
            language_models={
                "fast": fc.OpenAILanguageModel(
                    model_name="gpt-4o-mini",
                    rpm=500,
                    tpm=200_000
                ),
                "powerful": fc.OpenAILanguageModel(
                    model_name="gpt-4o",
                    rpm=100,
                    tpm=100_000
                )
            },
            default_language_model="fast"  # Use fast by default
        )
    )
)

# Use default model (fast)
result1 = df.semantic.classify("text", ["a", "b", "c"])

# Use specific model (powerful)
result2 = df.semantic.extract(
    "text",
    ComplexSchema,
    model="powerful"  # Override default
)

Token Tracking

Fenic tracks token usage and costs:

metrics = df_analyzed.write.save_as_table("results", mode="overwrite")

print(f"Total cost: ${metrics.total_lm_metrics.cost:.4f}")
print(f"Input tokens: {metrics.total_lm_metrics.num_uncached_input_tokens}")
print(f"Output tokens: {metrics.total_lm_metrics.num_output_tokens}")

Next Steps

Core Concepts

Learn about Sessions, DataFrames, and Semantic Operations

API Reference

Explore all available functions and operators

Examples

Browse real-world examples and use cases

MCP Integration

Expose Fenic pipelines as MCP tools for agents

Common Patterns

Pattern 1: Extract → Filter → Aggregate

results = (
    df.semantic.extract("text", MySchema)
    .filter(fc.col("extracted").field == "value")
    .group_by("category")
    .agg(fc.count("*").alias("count"))
)

Pattern 2: Parse → Embed → Search

# Index phase
indexed = (
    session.read.pdf_metadata("docs/*.pdf")
    .semantic.parse_pdf()
    .semantic.embed()
)
indexed.write.save_as_table("index")

# Search phase
results = query.semantic.sim_join(session.table("index"), k=5)

Pattern 3: Classify → Route → Process

routed = df.semantic.classify("text", ["urgent", "normal", "low"])

urgent = routed.filter(fc.col("class") == "urgent")
normal = routed.filter(fc.col("class") == "normal")
low = routed.filter(fc.col("class") == "low")

All semantic operations are lazy—they only execute when you call .show(), .collect(), .write.save_as_table(), or other action methods.

Get Started

Core Concepts

Guides

Examples

Integrations

Quickstart

Quickstart

Prerequisites

Your First Pipeline

Run Your Pipeline

Understanding the Code

Key Semantic Operations

Extract

Classify

Embed

Map

Combining with Deterministic Operations

Working with PDFs

Semantic Search

Saving Results

Working with Multiple Models

Token Tracking

Next Steps

Core Concepts

API Reference

Examples

MCP Integration

Common Patterns

Pattern 1: Extract → Filter → Aggregate

Pattern 2: Parse → Embed → Search

Pattern 3: Classify → Route → Process

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Examples

Integrations

​Quickstart

​Prerequisites

​Your First Pipeline

​Run Your Pipeline

​Understanding the Code

​Key Semantic Operations

​Extract

​Classify

​Embed

​Map

​Combining with Deterministic Operations

​Working with PDFs

​Semantic Search

​Saving Results

​Working with Multiple Models

​Token Tracking

​Next Steps

Core Concepts

API Reference

Examples

MCP Integration

​Common Patterns

​Pattern 1: Extract → Filter → Aggregate

​Pattern 2: Parse → Embed → Search

​Pattern 3: Classify → Route → Process

Build docs developers (and LLMs) love

Quickstart

Prerequisites

Your First Pipeline

Run Your Pipeline

Understanding the Code

Key Semantic Operations

Extract

Classify

Embed

Map

Combining with Deterministic Operations

Working with PDFs

Semantic Search

Saving Results

Working with Multiple Models

Token Tracking

Next Steps

Common Patterns

Pattern 1: Extract → Filter → Aggregate

Pattern 2: Parse → Embed → Search

Pattern 3: Classify → Route → Process