Simple Prompt Optimization Tutorial
Learn the fundamentals of prompt optimization with GEPA through a minimal, easy-to-understand example. This tutorial walks you through optimizing a system prompt in just a few lines of code.Overview
GEPA (Genetic-Pareto) uses LLM-based reflection and evolutionary search to optimize text parameters. Unlike traditional methods that only see scalar scores, GEPA reads full execution traces to understand why candidates fail and propose targeted improvements.Install GEPA
Prepare Your Data
Create training and validation datasets. Each example should have an input and expected output:Best Practices:
- Use 10-50 training examples for good results
- Keep 20-30% of data for validation
- Ensure examples cover diverse aspects of your task
Define the Seed Prompt
Start with a basic prompt as your baseline:GEPA will evolve this into a more effective, task-specific prompt.
Run GEPA Optimization
Optimize your prompt with a single function call:What’s happening:
- GEPA evaluates the seed prompt on training examples
- An LLM reflects on failures and proposes improvements
- Better prompts are selected using Pareto-efficient search
- Process repeats for
max_metric_callsiterations
Complete Example
Here’s a full working example:Key Concepts
Pareto-Efficient Search
GEPA maintains a frontier of candidates, keeping any that excel on specific examples—even if their average score is lower.
Actionable Side Information
Unlike methods that only see pass/fail scores, GEPA reads error messages, reasoning traces, and execution details.
LLM-Based Reflection
A reflection LLM analyzes failures, diagnoses root causes, and proposes targeted improvements—not random mutations.
Few Evaluations
Achieves strong results with 50-150 evaluations vs. 5,000-25,000+ for reinforcement learning methods.
Configuration Options
Model Selection
Custom Metrics
Local Models with Ollama
Troubleshooting
Low improvement or stagnation
Low improvement or stagnation
- Increase budget: Try
max_metric_calls=100or higher - Better reflection model: Use GPT-4o or o1 for reflection
- More diverse examples: Ensure trainset covers edge cases
- Check metric: Verify your evaluation metric is meaningful
API errors or rate limits
API errors or rate limits
- Add delays: GEPA respects rate limits automatically
- Use tier-appropriate limits: Set
max_metric_callsbased on your API tier - Monitor costs: Each metric call uses the task_lm once
Poor generalization to validation set
Poor generalization to validation set
- More validation data: Use at least 5-10 validation examples
- Regularization: GEPA’s Pareto frontier naturally prevents overfitting
- Data quality: Ensure validation set represents real usage
Next Steps
Math Optimization
Optimize prompts for complex mathematical reasoning tasks
RAG Pipeline
Optimize entire RAG systems with multiple vector stores
Agent Architecture
Evolve complete agent systems beyond just prompts
API Reference
Explore all configuration options and advanced features
Learn More
- GEPA Paper - Research paper with detailed methodology
- DSPy Integration - Use GEPA within DSPy pipelines
- Use Cases - Real-world applications across industries
- Callbacks Guide - Monitor and customize optimization