Skip to main content
GEPA Logo

What is GEPA?

GEPA (Genetic-Pareto) is a Python framework for optimizing any system with textual parameters against any evaluation metric. Unlike RL or gradient-based methods that collapse execution traces into a single scalar reward, GEPA uses LLMs to read full execution traces — error messages, profiling data, reasoning logs — to diagnose why a candidate failed and propose targeted fixes. Through iterative reflection, mutation, and Pareto-aware selection, GEPA evolves high-performing variants with minimal evaluations. If you can measure it, you can optimize it: prompts, code, agent architectures, scheduling policies, vector graphics, and more.

Key Results

90x cheaper

Open-source models + GEPA beat Claude Opus 4.1 at Databricks

35x faster than RL

100–500 evaluations vs. 5,000–25,000+ for GRPO (paper)

32% → 89%

ARC-AGI agent accuracy via architecture discovery

40.2% cost savings

Cloud scheduling policy discovered by GEPA, beating expert heuristics

55% → 82%

Coding agent resolve rate on Jinja via auto-learned skills

50+ production uses

Across Shopify, Databricks, Dropbox, OpenAI, Pydantic, MLflow, and more
“Both DSPy and (especially) GEPA are currently severely under hyped in the AI context engineering world” — Tobi Lutke, CEO, Shopify

How It Works

Traditional optimizers know that a candidate failed but not why. GEPA takes a different approach:
1

Select

Choose a candidate from the Pareto frontier (candidates excelling on different task subsets)
2

Execute

Run on a minibatch, capturing full execution traces
3

Reflect

An LLM reads the traces (error messages, profiler output, reasoning logs) and diagnoses failures
4

Mutate

Generate an improved candidate informed by accumulated lessons from all ancestors
5

Accept

Add to the pool if improved, update the Pareto front
GEPA also supports system-aware merge — combining strengths of two Pareto-optimal candidates excelling on different tasks. The key concept is Actionable Side Information (ASI): diagnostic feedback returned by evaluators that serves as the text-optimization analogue of a gradient.

When GEPA Shines

Expensive rollouts

Scientific simulations, complex agents with tool calls, slow compilation. GEPA needs 100–500 evals vs 10K+ for RL.

Scarce data

Works with as few as 3 examples. No large training sets required.

API-only models

No weights access needed. Optimize GPT-5, Claude, Gemini directly through their APIs.

Interpretability

Human-readable optimization traces show why each prompt changed.

Complements RL

Use GEPA for rapid initial optimization, then apply RL/fine-tuning for additional gains.

Get Started

Installation

Install GEPA via pip and get set up in seconds

Quick Start

Run your first optimization in minutes

GitHub

View source code and contribute

Paper

Read the research paper

Discord

Join the community

Blog

Read tutorials and case studies

Key Features

Three Optimization Modes

GEPA supports three distinct optimization paradigms:
  1. Single-Task Search — Solve one hard problem. The candidate is the solution.
  2. Multi-Task Search — Solve a batch of related problems with cross-task transfer.
  3. Generalization — Build a skill that transfers to unseen problems.

Built-in Adapters

GEPA connects to your system via the GEPAAdapter interface:
  • DefaultAdapter — System prompt optimization for single-turn LLM tasks
  • DSPy Full Program — Evolves entire DSPy programs (signatures, modules, control flow). 67% → 93% on MATH.
  • Generic RAG — Vector store-agnostic RAG optimization (ChromaDB, Weaviate, Qdrant, Pinecone)
  • MCP Adapter — Optimize MCP tool descriptions and system prompts
  • TerminalBench — Optimize the Terminus terminal-use agent
  • AnyMaths — Mathematical problem-solving and reasoning tasks

Framework Integrations

GEPA is integrated into several major frameworks:
  • DSPydspy.GEPA for optimizing DSPy programs
  • MLflowmlflow.genai.optimize_prompts() for automatic prompt improvement
  • Comet ML Opik — Core optimization algorithm in Opik Agent Optimizer
  • Pydantic — Prompt optimization for Pydantic AI
  • OpenAI Cookbook — Self-evolving agents with GEPA
  • HuggingFace Cookbook — Prompt optimization guide

Citation

If you use GEPA in your research, please cite:
@misc{agrawal2025gepareflectivepromptevolution,
      title={GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning},
      author={Lakshya A Agrawal and Shangyin Tan and Dilara Soylu and Noah Ziems and Rishi Khare and Krista Opsahl-Ong and Arnav Singhvi and Herumb Shandilya and Michael J Ryan and Meng Jiang and Christopher Potts and Koushik Sen and Alexandros G. Dimakis and Ion Stoica and Dan Klein and Matei Zaharia and Omar Khattab},
      year={2025},
      eprint={2507.19457},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2507.19457},
}

Build docs developers (and LLMs) love