Skip to main content

Overview

Pydantic AI is a Python framework for building production-grade AI applications with type safety and validation. GEPA can optimize prompts and instructions for Pydantic AI agents, improving their performance through evolutionary search and LLM-based reflection.

Why Use GEPA with Pydantic AI?

Pydantic AI provides structured outputs and validation, while GEPA optimizes the prompts that guide agent behavior:
  • Type-safe optimization: Maintain Pydantic’s type safety while improving prompts
  • Automated improvement: Let GEPA discover better instructions through reflection
  • Data-driven: Optimize based on real evaluation metrics
  • Production-ready: Combine Pydantic’s validation with GEPA’s optimization

Setup

Install both Pydantic AI and GEPA:
pip install pydantic-ai gepa

Basic Example

Here’s how to optimize a Pydantic AI agent with GEPA:
import gepa
from pydantic_ai import Agent
from pydantic import BaseModel

# Define your Pydantic model
class MathAnswer(BaseModel):
    reasoning: str
    answer: float

# Create evaluator that uses Pydantic AI
def evaluate_prompt(example):
    # Create agent with current prompt
    agent = Agent(
        'openai:gpt-4o-mini',
        result_type=MathAnswer,
        system_prompt=example['system_prompt'],
    )
    
    # Run agent
    result = agent.run_sync(example['question'])
    
    # Score the result
    correct = abs(result.data.answer - example['expected_answer']) < 0.01
    return 1.0 if correct else 0.0

# Sample dataset
trainset = [
    {
        'question': 'What is 15% of 80?',
        'expected_answer': 12.0,
    },
    {
        'question': 'If a rectangle has length 7 and width 3, what is its area?',
        'expected_answer': 21.0,
    },
]

# Optimize the system prompt
result = gepa.optimize(
    seed_candidate={
        'system_prompt': 'You are a math assistant. Solve the problem and explain your reasoning.'
    },
    trainset=trainset,
    valset=trainset,
    task_lm='openai/gpt-4o-mini',
    reflection_lm='openai/gpt-4o',
    max_metric_calls=50,
)

print("Optimized prompt:", result.best_candidate['system_prompt'])

Advanced: Custom Adapter

For more complex Pydantic AI applications, create a custom adapter:
from gepa.core.adapter import GEPAAdapter, DataInst, RolloutOutput, EvaluationResult
from pydantic_ai import Agent
from pydantic import BaseModel
from typing import Any

class CustomerSupportResponse(BaseModel):
    sentiment: str  # positive, neutral, negative
    response: str
    confidence: float

class PydanticAIAdapter(GEPAAdapter):
    def __init__(self, model: str = 'openai:gpt-4o-mini'):
        self.model = model
    
    def evaluate(
        self,
        inputs: list[DataInst],
        candidate: dict[str, str],
        capture_traces: bool = False,
    ) -> EvaluationResult:
        # Create agent with current candidate's system prompt
        agent = Agent(
            self.model,
            result_type=CustomerSupportResponse,
            system_prompt=candidate['system_prompt'],
        )
        
        outputs = []
        scores = []
        
        for example in inputs:
            # Run agent
            result = agent.run_sync(example['customer_message'])
            outputs.append(result.data)
            
            # Score based on sentiment accuracy and response quality
            sentiment_correct = result.data.sentiment == example['expected_sentiment']
            confidence_good = result.data.confidence > 0.7
            score = 1.0 if (sentiment_correct and confidence_good) else 0.0
            scores.append(score)
        
        return EvaluationResult(
            outputs=outputs,
            scores=scores,
            trajectories=[],
        )

# Use the custom adapter
trainset = [
    {
        'customer_message': 'Your product is amazing! It solved all my problems.',
        'expected_sentiment': 'positive',
    },
    {
        'customer_message': 'I am very disappointed with the service.',
        'expected_sentiment': 'negative',
    },
]

result = gepa.optimize(
    seed_candidate={
        'system_prompt': 'Analyze customer messages and respond appropriately.'
    },
    trainset=trainset,
    valset=trainset,
    adapter=PydanticAIAdapter(),
    reflection_lm='openai/gpt-4o',
    max_metric_calls=50,
)

Multi-Agent Optimization

Optimize multiple agents in a Pydantic AI workflow:
from pydantic_ai import Agent

# Define multiple candidate components
seed_candidate = {
    'analyzer_prompt': 'Analyze the input and extract key information.',
    'responder_prompt': 'Generate an appropriate response based on the analysis.',
}

# Create custom adapter for multi-agent workflow
class MultiAgentAdapter(GEPAAdapter):
    def evaluate(self, inputs, candidate, capture_traces=False):
        # Create two agents
        analyzer = Agent(
            'openai:gpt-4o-mini',
            system_prompt=candidate['analyzer_prompt'],
        )
        responder = Agent(
            'openai:gpt-4o-mini',
            system_prompt=candidate['responder_prompt'],
        )
        
        outputs = []
        scores = []
        
        for example in inputs:
            # Run agents in sequence
            analysis = analyzer.run_sync(example['input'])
            response = responder.run_sync(
                f"Analysis: {analysis.data}\nGenerate response for: {example['input']}"
            )
            
            outputs.append(response.data)
            # Score based on your criteria
            score = compute_score(response.data, example['expected_output'])
            scores.append(score)
        
        return EvaluationResult(
            outputs=outputs,
            scores=scores,
            trajectories=[],
        )

# Optimize both prompts
result = gepa.optimize(
    seed_candidate=seed_candidate,
    trainset=trainset,
    adapter=MultiAgentAdapter(),
    reflection_lm='openai/gpt-4o',
    max_metric_calls=100,
)

Best Practices

  1. Define clear metrics: Use Pydantic’s validation to create precise scoring functions
  2. Start simple: Begin with single-agent optimization before moving to multi-agent workflows
  3. Leverage type safety: Use Pydantic models to ensure structured outputs during optimization
  4. Monitor costs: Track API usage during optimization, especially with larger models
  5. Iterate on datasets: Expand your training set as you discover edge cases

Integration Benefits

Type Safety

Maintain Pydantic’s type validation throughout optimization

Automated Discovery

Let GEPA find better prompts through reflection and evolution

Production Ready

Deploy optimized prompts with confidence using Pydantic’s validation

Cost Effective

Optimize with 100-500 evaluations instead of thousands

External Resources

Pydantic AI Prompt Optimization Guide

Official guide for optimizing Pydantic AI with GEPA

Pydantic AI Documentation

Learn more about Pydantic AI

Code Examples

Complete code examples on GitHub

Build docs developers (and LLMs) love