What is GEPA?
GEPA (Genetic-Pareto) is a Python framework for optimizing any system with textual parameters against any evaluation metric. Unlike RL or gradient-based methods that collapse execution traces into a single scalar reward, GEPA uses LLMs to read full execution traces — error messages, profiling data, reasoning logs — to diagnose why a candidate failed and propose targeted fixes. Through iterative reflection, mutation, and Pareto-aware selection, GEPA evolves high-performing variants with minimal evaluations. If you can measure it, you can optimize it: prompts, code, agent architectures, scheduling policies, vector graphics, and more.Key Results
90x cheaper
Open-source models + GEPA beat Claude Opus 4.1 at Databricks
35x faster than RL
100–500 evaluations vs. 5,000–25,000+ for GRPO (paper)
32% → 89%
ARC-AGI agent accuracy via architecture discovery
40.2% cost savings
Cloud scheduling policy discovered by GEPA, beating expert heuristics
55% → 82%
Coding agent resolve rate on Jinja via auto-learned skills
50+ production uses
Across Shopify, Databricks, Dropbox, OpenAI, Pydantic, MLflow, and more
“Both DSPy and (especially) GEPA are currently severely under hyped in the AI context engineering world” — Tobi Lutke, CEO, Shopify
How It Works
Traditional optimizers know that a candidate failed but not why. GEPA takes a different approach:Reflect
An LLM reads the traces (error messages, profiler output, reasoning logs) and diagnoses failures
When GEPA Shines
Expensive rollouts
Scientific simulations, complex agents with tool calls, slow compilation. GEPA needs 100–500 evals vs 10K+ for RL.
Scarce data
Works with as few as 3 examples. No large training sets required.
API-only models
No weights access needed. Optimize GPT-5, Claude, Gemini directly through their APIs.
Interpretability
Human-readable optimization traces show why each prompt changed.
Complements RL
Use GEPA for rapid initial optimization, then apply RL/fine-tuning for additional gains.
Get Started
Installation
Install GEPA via pip and get set up in seconds
Quick Start
Run your first optimization in minutes
GitHub
View source code and contribute
Paper
Read the research paper
Discord
Join the community
Blog
Read tutorials and case studies
Key Features
Three Optimization Modes
GEPA supports three distinct optimization paradigms:- Single-Task Search — Solve one hard problem. The candidate is the solution.
- Multi-Task Search — Solve a batch of related problems with cross-task transfer.
- Generalization — Build a skill that transfers to unseen problems.
Built-in Adapters
GEPA connects to your system via theGEPAAdapter interface:
- DefaultAdapter — System prompt optimization for single-turn LLM tasks
- DSPy Full Program — Evolves entire DSPy programs (signatures, modules, control flow). 67% → 93% on MATH.
- Generic RAG — Vector store-agnostic RAG optimization (ChromaDB, Weaviate, Qdrant, Pinecone)
- MCP Adapter — Optimize MCP tool descriptions and system prompts
- TerminalBench — Optimize the Terminus terminal-use agent
- AnyMaths — Mathematical problem-solving and reasoning tasks
Framework Integrations
GEPA is integrated into several major frameworks:- DSPy —
dspy.GEPAfor optimizing DSPy programs - MLflow —
mlflow.genai.optimize_prompts()for automatic prompt improvement - Comet ML Opik — Core optimization algorithm in Opik Agent Optimizer
- Pydantic — Prompt optimization for Pydantic AI
- OpenAI Cookbook — Self-evolving agents with GEPA
- HuggingFace Cookbook — Prompt optimization guide