Simple Prompt Optimization Tutorial

Learn the fundamentals of prompt optimization with GEPA through a minimal, easy-to-understand example. This tutorial walks you through optimizing a system prompt in just a few lines of code.

Overview

GEPA (Genetic-Pareto) uses LLM-based reflection and evolutionary search to optimize text parameters. Unlike traditional methods that only see scalar scores, GEPA reads full execution traces to understand why candidates fail and propose targeted improvements.

Install GEPA

pip install gepa

GEPA works with any LLM provider supported by LiteLLM (OpenAI, Anthropic, local models via Ollama, etc.).

Prepare Your Data

Create training and validation datasets. Each example should have an input and expected output:

trainset = [
    {
        "input": "What is machine learning?",
        "answer": "Machine learning is a method of data analysis that automates "
                  "analytical model building..."
    },
    {
        "input": "Explain neural networks",
        "answer": "Neural networks are computing systems inspired by biological "
                  "neural networks..."
    },
    # Add more examples...
]

valset = [
    {
        "input": "What is deep learning?",
        "answer": "Deep learning is a subset of machine learning based on "
                  "artificial neural networks..."
    },
    # Add validation examples...
]

Best Practices:

Use 10-50 training examples for good results
Keep 20-30% of data for validation
Ensure examples cover diverse aspects of your task

Define the Seed Prompt

Start with a basic prompt as your baseline:

seed_prompt = {
    "system_prompt": "You are a helpful AI assistant. Answer questions clearly and accurately."
}

GEPA will evolve this into a more effective, task-specific prompt.

Run GEPA Optimization

Optimize your prompt with a single function call:

import gepa

result = gepa.optimize(
    seed_candidate=seed_prompt,
    trainset=trainset,
    valset=valset,
    task_lm="openai/gpt-4o-mini",      # Model being optimized
    max_metric_calls=50,                 # Number of iterations
    reflection_lm="openai/gpt-4o",      # Model generating improvements
)

print("Optimized prompt:")
print(result.best_candidate['system_prompt'])
print(f"\nValidation score: {result.val_aggregate_scores[result.best_idx]:.3f}")

What’s happening:

GEPA evaluates the seed prompt on training examples
An LLM reflects on failures and proposes improvements
Better prompts are selected using Pareto-efficient search
Process repeats for max_metric_calls iterations

Understand the Output

GEPA returns a GEPAResult object containing:

# Best optimized prompt
best_prompt = result.best_candidate

# Validation scores for all candidates
val_scores = result.val_aggregate_scores

# Index of best candidate
best_idx = result.best_idx

# Total optimization iterations
total_calls = result.total_metric_calls

print(f"Tried {total_calls} candidates")
print(f"Best validation score: {val_scores[best_idx]:.3f}")
print(f"Improvement: {val_scores[best_idx] - val_scores[0]:.3f}")

Use the Optimized Prompt

Deploy your optimized prompt in production:

import litellm

def answer_question(question: str, optimized_prompt: dict) -> str:
    """Use the optimized prompt to answer questions"""
    response = litellm.completion(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": optimized_prompt["system_prompt"]},
            {"role": "user", "content": question}
        ]
    )
    return response.choices[0].message.content

# Use it
answer = answer_question(
    "What is reinforcement learning?",
    result.best_candidate
)
print(answer)

Complete Example

Here’s a full working example:

import gepa

# 1. Prepare data
trainset = [
    {"input": "What is AI?", "answer": "Artificial Intelligence..."},
    {"input": "What is ML?", "answer": "Machine Learning..."},
    {"input": "What is DL?", "answer": "Deep Learning..."},
]

valset = [
    {"input": "What is NLP?", "answer": "Natural Language Processing..."},
]

# 2. Define seed prompt
seed_prompt = {
    "system_prompt": "You are a helpful AI assistant."
}

# 3. Optimize
result = gepa.optimize(
    seed_candidate=seed_prompt,
    trainset=trainset,
    valset=valset,
    task_lm="openai/gpt-4o-mini",
    max_metric_calls=30,
    reflection_lm="openai/gpt-4o",
)

# 4. Review results
print("Original:", seed_prompt["system_prompt"])
print("\nOptimized:", result.best_candidate["system_prompt"])
print(f"\nScore improvement: +{result.val_aggregate_scores[result.best_idx]:.3f}")

Key Concepts

Pareto-Efficient Search

GEPA maintains a frontier of candidates, keeping any that excel on specific examples—even if their average score is lower.

Actionable Side Information

Unlike methods that only see pass/fail scores, GEPA reads error messages, reasoning traces, and execution details.

LLM-Based Reflection

A reflection LLM analyzes failures, diagnoses root causes, and proposes targeted improvements—not random mutations.

Few Evaluations

Achieves strong results with 50-150 evaluations vs. 5,000-25,000+ for reinforcement learning methods.

Configuration Options

Model Selection

# Use different models for task and reflection
result = gepa.optimize(
    seed_candidate=seed_prompt,
    trainset=trainset,
    valset=valset,
    task_lm="openai/gpt-4o-mini",       # Cheaper model being optimized
    reflection_lm="openai/o1",           # Smarter model for reflection
    max_metric_calls=100,
)

Custom Metrics

def custom_metric(prediction, ground_truth, trace=None):
    """Define how to score predictions"""
    # Exact match
    if prediction.strip().lower() == ground_truth.strip().lower():
        return 1.0
    
    # Partial credit for keyword overlap
    pred_words = set(prediction.lower().split())
    true_words = set(ground_truth.lower().split())
    overlap = len(pred_words & true_words) / len(true_words)
    
    return overlap

# Use custom metric
from gepa.adapters.default_adapter import DefaultAdapter

adapter = DefaultAdapter(
    task_lm="openai/gpt-4o-mini",
    metric=custom_metric
)

result = gepa.optimize(
    seed_candidate=seed_prompt,
    trainset=trainset,
    valset=valset,
    adapter=adapter,
    reflection_lm="openai/gpt-4o",
    max_metric_calls=50,
)

Local Models with Ollama

# Use local models via Ollama (requires ollama running locally)
result = gepa.optimize(
    seed_candidate=seed_prompt,
    trainset=trainset,
    valset=valset,
    task_lm="ollama/llama3.1:8b",
    reflection_lm="ollama/llama3.1:70b",
    max_metric_calls=50,
)

Troubleshooting

Low improvement or stagnation

Increase budget: Try max_metric_calls=100 or higher
Better reflection model: Use GPT-4o or o1 for reflection
More diverse examples: Ensure trainset covers edge cases
Check metric: Verify your evaluation metric is meaningful

API errors or rate limits

Add delays: GEPA respects rate limits automatically
Use tier-appropriate limits: Set max_metric_calls based on your API tier
Monitor costs: Each metric call uses the task_lm once

Poor generalization to validation set

More validation data: Use at least 5-10 validation examples
Regularization: GEPA’s Pareto frontier naturally prevents overfitting
Data quality: Ensure validation set represents real usage

Next Steps

Math Optimization

Optimize prompts for complex mathematical reasoning tasks

RAG Pipeline

Optimize entire RAG systems with multiple vector stores

Agent Architecture

Evolve complete agent systems beyond just prompts

API Reference

Explore all configuration options and advanced features

Learn More

GEPA Paper - Research paper with detailed methodology
DSPy Integration - Use GEPA within DSPy pipelines
Use Cases - Real-world applications across industries
Callbacks Guide - Monitor and customize optimization

Tutorials

Integrations

Simple Prompt Optimization

Simple Prompt Optimization Tutorial

Overview

Complete Example

Key Concepts

Pareto-Efficient Search

Actionable Side Information

LLM-Based Reflection

Few Evaluations

Configuration Options

Model Selection

Custom Metrics

Local Models with Ollama

Troubleshooting

Next Steps

Math Optimization

RAG Pipeline

Agent Architecture

API Reference

Learn More

Build docs developers (and LLMs) love

Tutorials

Integrations

​Simple Prompt Optimization Tutorial

​Overview

​Complete Example

​Key Concepts

Pareto-Efficient Search

Actionable Side Information

LLM-Based Reflection

Few Evaluations

​Configuration Options

​Model Selection

​Custom Metrics

​Local Models with Ollama

​Troubleshooting

​Next Steps

Math Optimization

RAG Pipeline

Agent Architecture

API Reference

​Learn More

Build docs developers (and LLMs) love

Simple Prompt Optimization Tutorial

Overview

Complete Example

Key Concepts

Configuration Options

Model Selection

Custom Metrics

Local Models with Ollama

Troubleshooting

Next Steps

Learn More