Why GEPA?
If you can measure it, you can optimize it: prompts, code, agent architectures, scheduling policies, vector graphics, and more. Unlike RL or gradient-based methods that collapse execution traces into a single scalar reward, GEPA uses LLMs to read full execution traces — error messages, profiling data, reasoning logs — to diagnose why a candidate failed and propose targeted fixes.Key Results at a Glance
90x Cheaper
Open-source models + GEPA beat Claude Opus 4.1 at Databricks
35x Faster than RL
100–500 evaluations vs. 5,000–25,000+ for GRPO
32% → 89%
ARC-AGI agent accuracy via architecture discovery
40.2% Cost Savings
Cloud scheduling policy discovered by GEPA, beating expert heuristics
55% → 82%
Coding agent resolve rate on Jinja via auto-learned skills
50+ Production Uses
Across Shopify, Databricks, Dropbox, OpenAI, Pydantic, MLflow, Comet ML
Use Case Categories
Prompt Optimization
Improve LLM accuracy through reflective prompt evolution. Achieve 46.6% → 56.6% on AIME with GPT-4.1 Mini.
Code Optimization
Generate and optimize code including CUDA kernels, scheduling policies, and system algorithms.
Agent Architecture
Discover optimal agent designs through evolutionary search. Nearly triple accuracy on ARC-AGI.
RAG Optimization
Optimize retrieval pipelines, reranking strategies, and query generation for better context.
When GEPA Shines
Expensive Rollouts
Expensive Rollouts
Scientific simulations, complex agents with tool calls, slow compilation. GEPA needs 100–500 evals vs 10K+ for RL.
Scarce Data
Scarce Data
Works with as few as 3 examples. No large training sets required.
API-Only Models
API-Only Models
No weights access needed. Optimize GPT-5, Claude, Gemini directly through their APIs.
Interpretability
Interpretability
Human-readable optimization traces show why each prompt changed.
Complements RL
Complements RL
Use GEPA for rapid initial optimization, then apply RL/fine-tuning for additional gains.
Real-World Impact
“Both DSPy and (especially) GEPA are currently severely under hyped in the AI context engineering world” — Tobi Lutke, CEO, Shopify
Production Deployments
GEPA is used in production at:- Databricks: 90x cost reduction with enterprise agents
- Shopify: Optimizing AI context engineering
- Dropbox: Production AI systems
- OpenAI: Featured in official cookbook for self-evolving agents
- Pydantic: Contact extraction improved from 86% → 97%
- MLflow: Integrated as
mlflow.genai.optimize_prompts() - Comet ML: Core algorithm in Opik Agent Optimizer
Research Impact
- 35x faster than RL: 100–500 evaluations vs. 5,000–25,000+ for GRPO (paper)
- State-of-the-art results: MATH benchmark 67% → 93% with DSPy Full Program optimization
- Sample efficiency: Works with minimal data where RL requires thousands of examples
Getting Started
Ready to optimize your AI system? Choose your use case:Quick Start
Get up and running with GEPA in minutes
Adapters Guide
Learn how to integrate GEPA with your system
API Reference
Complete API documentation
Community
Join our Discord community
Optimization Modes
GEPA supports three optimization paradigms:- Single-Task Search: Solve one hard problem (e.g., circle packing, blackbox optimization)
- Multi-Task Search: Solve a batch of related problems with cross-transfer (e.g., CUDA kernels)
- Generalization: Build a skill that transfers to unseen problems (e.g., prompt optimization, agent architecture)