Key Results
AIME 2025
46.6% → 56.6% accuracy with GPT-4.1 Mini (+10 percentage points)
HotpotQA
Multi-hop retrieval optimization with detailed query generation strategies
Enterprise Tasks
3-7% performance gains across all model types at Databricks
Sample Efficient
Works with as few as 3 examples — no large training sets required
Simple Prompt Optimization
Optimize a system prompt for math problems from the AIME benchmark:With DSPy (Recommended)
The most powerful way to use GEPA for prompt optimization is within DSPy, where it’s available asdspy.GEPA:
Real-World Results with DSPy
- MATH benchmark: 67% → 93% with DSPy Full Program optimization
- Structured extraction: 20+ percentage point improvement in exact match accuracy
- Contact extraction: 86% → 97% accuracy with Pydantic AI integration
Example: AIME Math Optimization
From the AIME 2025 optimization case study:Optimization Trajectory
Starting from a simple prompt, GEPA evolves detailed problem-solving strategies:- Initial: “You are a helpful assistant. Answer the question.”
- After 150 calls: Detailed instructions for base conversions, palindromes, symmetric sums, intersecting families, and more
- Final accuracy: 46.67% → 60.00% on AIME 2025
How It Works
GEPA’s prompt optimization follows this cycle:- Select a candidate from the Pareto frontier
- Execute on a minibatch, capturing full execution traces
- Reflect — LLM reads traces and diagnoses failures
- Mutate — Generate improved candidate informed by lessons from ancestors
- Accept — Add to pool if improved, update Pareto front
Actionable Side Information (ASI)
Unlike gradient-based methods, GEPA uses Actionable Side Information:Evolved Prompts
GEPA discovers detailed, domain-specific prompts. Here are examples:AIME Prompt (Excerpt)
HotpotQA Multi-Hop Retrieval Prompt (Excerpt)
Production Use Cases
Databricks: Enterprise Agents
90x cost reduction while maintaining or improving performance by optimizing enterprise agents with GEPA.
- Open-source models + GEPA outperform Claude Opus 4.1, Claude Sonnet 4, and GPT-5
- Consistent 3-7% performance gains across all model types
- At 100,000 requests, serving costs represent 95%+ of AI expenditure
Pydantic AI: Contact Extraction
Contact extraction improved from 86% → 97% accuracy using GEPA with Pydantic AI.HuggingFace: Structured Extraction
20+ percentage point improvement in exact match accuracy for structured extraction tasks. View the cookbook →Advantages Over RL
35x Faster
100–500 evaluations vs. 5,000–25,000+ for GRPO
Interpretable
Human-readable traces show why each prompt changed
Sample Efficient
Works with as few as 3 examples
API-Only Models
No weights access needed — works with GPT-5, Claude, Gemini
Comparison with RL Methods
From the GEPA paper:- GRPO (Group Relative Policy Optimization): Requires 5,000–25,000+ evaluations
- GEPA: Achieves comparable or better results with 100–500 evaluations
- Key insight: Reading full traces is more informative than scalar rewards
Integration Examples
MLflow Integration
Comet ML Opik
GEPA is the core optimization algorithm in Opik Agent Optimizer:Best Practices
Start with a simple seed
Start with a simple seed
Begin with a minimal prompt like “You are a helpful assistant.” GEPA will evolve the complexity.
Use informative evaluation metrics
Use informative evaluation metrics
Return structured feedback in your evaluator to help GEPA understand failure modes.
Leverage Pareto selection
Leverage Pareto selection
When you have multiple aspects to optimize (accuracy, brevity, tone), GEPA’s Pareto frontier preserves candidates that excel at different objectives.
Use validation sets for generalization
Use validation sets for generalization
Always provide a valset to ensure your optimized prompt generalizes to unseen examples.
Combine with DSPy for complex pipelines
Combine with DSPy for complex pipelines
For multi-step AI pipelines, use DSPy with GEPA to optimize entire programs, not just prompts.
Next Steps
Quick Start
Get started with GEPA in 5 minutes
Code Optimization
Learn about optimizing code with GEPA
DSPy Tutorials
Step-by-step DSPy + GEPA tutorials
API Reference
Complete API documentation