Introduction to GEPA

What is GEPA?

GEPA (Genetic-Pareto) is a Python framework for optimizing any system with textual parameters against any evaluation metric. Unlike RL or gradient-based methods that collapse execution traces into a single scalar reward, GEPA uses LLMs to read full execution traces — error messages, profiling data, reasoning logs — to diagnose why a candidate failed and propose targeted fixes. Through iterative reflection, mutation, and Pareto-aware selection, GEPA evolves high-performing variants with minimal evaluations. If you can measure it, you can optimize it: prompts, code, agent architectures, scheduling policies, vector graphics, and more.

Key Results

90x cheaper

Open-source models + GEPA beat Claude Opus 4.1 at Databricks

35x faster than RL

100–500 evaluations vs. 5,000–25,000+ for GRPO (paper)

32% → 89%

ARC-AGI agent accuracy via architecture discovery

40.2% cost savings

Cloud scheduling policy discovered by GEPA, beating expert heuristics

55% → 82%

Coding agent resolve rate on Jinja via auto-learned skills

50+ production uses

Across Shopify, Databricks, Dropbox, OpenAI, Pydantic, MLflow, and more

“Both DSPy and (especially) GEPA are currently severely under hyped in the AI context engineering world” — Tobi Lutke, CEO, Shopify

How It Works

Traditional optimizers know that a candidate failed but not why. GEPA takes a different approach:

Select

Choose a candidate from the Pareto frontier (candidates excelling on different task subsets)

Execute

Run on a minibatch, capturing full execution traces

Reflect

An LLM reads the traces (error messages, profiler output, reasoning logs) and diagnoses failures

Mutate

Generate an improved candidate informed by accumulated lessons from all ancestors

Add to the pool if improved, update the Pareto front

GEPA also supports system-aware merge — combining strengths of two Pareto-optimal candidates excelling on different tasks. The key concept is Actionable Side Information (ASI): diagnostic feedback returned by evaluators that serves as the text-optimization analogue of a gradient.

When GEPA Shines

Expensive rollouts

Scientific simulations, complex agents with tool calls, slow compilation. GEPA needs 100–500 evals vs 10K+ for RL.

Scarce data

Works with as few as 3 examples. No large training sets required.

API-only models

No weights access needed. Optimize GPT-5, Claude, Gemini directly through their APIs.

Interpretability

Human-readable optimization traces show why each prompt changed.

Complements RL

Use GEPA for rapid initial optimization, then apply RL/fine-tuning for additional gains.

Get Started

Installation

Install GEPA via pip and get set up in seconds

Quick Start

Run your first optimization in minutes

GitHub

View source code and contribute

Paper

Read the research paper

Discord

Join the community

Blog

Read tutorials and case studies

Key Features

Three Optimization Modes

GEPA supports three distinct optimization paradigms:

Single-Task Search — Solve one hard problem. The candidate is the solution.
Multi-Task Search — Solve a batch of related problems with cross-task transfer.
Generalization — Build a skill that transfers to unseen problems.

Built-in Adapters

GEPA connects to your system via the GEPAAdapter interface:

DefaultAdapter — System prompt optimization for single-turn LLM tasks
DSPy Full Program — Evolves entire DSPy programs (signatures, modules, control flow). 67% → 93% on MATH.
Generic RAG — Vector store-agnostic RAG optimization (ChromaDB, Weaviate, Qdrant, Pinecone)
MCP Adapter — Optimize MCP tool descriptions and system prompts
TerminalBench — Optimize the Terminus terminal-use agent
AnyMaths — Mathematical problem-solving and reasoning tasks

Framework Integrations

GEPA is integrated into several major frameworks:

DSPy — dspy.GEPA for optimizing DSPy programs
MLflow — mlflow.genai.optimize_prompts() for automatic prompt improvement
Comet ML Opik — Core optimization algorithm in Opik Agent Optimizer
Pydantic — Prompt optimization for Pydantic AI
OpenAI Cookbook — Self-evolving agents with GEPA
HuggingFace Cookbook — Prompt optimization guide

Citation

If you use GEPA in your research, please cite:

@misc{agrawal2025gepareflectivepromptevolution,
      title={GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning},
      author={Lakshya A Agrawal and Shangyin Tan and Dilara Soylu and Noah Ziems and Rishi Khare and Krista Opsahl-Ong and Arnav Singhvi and Herumb Shandilya and Michael J Ryan and Meng Jiang and Christopher Potts and Koushik Sen and Alexandros G. Dimakis and Ion Stoica and Dan Klein and Matei Zaharia and Omar Khattab},
      year={2025},
      eprint={2507.19457},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2507.19457},
}

Get Started

Core Concepts

Guides

Use Cases

Introduction to GEPA

What is GEPA?

Key Results

90x cheaper

35x faster than RL

32% → 89%

40.2% cost savings

55% → 82%

50+ production uses

How It Works

When GEPA Shines

Expensive rollouts

Scarce data

API-only models

Interpretability

Complements RL

Get Started

Installation

Quick Start

GitHub

Paper

Discord

Blog

Key Features

Three Optimization Modes

Built-in Adapters

Framework Integrations

Citation

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Use Cases

​What is GEPA?

​Key Results

90x cheaper

35x faster than RL

32% → 89%

40.2% cost savings

55% → 82%

50+ production uses

​How It Works

​When GEPA Shines

Expensive rollouts

Scarce data

API-only models

Interpretability

Complements RL

​Get Started

Installation

Quick Start

GitHub

Paper

Discord

Blog

​Key Features

​Three Optimization Modes

​Built-in Adapters

​Framework Integrations

​Citation

Build docs developers (and LLMs) love

What is GEPA?

Key Results

How It Works

When GEPA Shines

Get Started

Key Features

Three Optimization Modes

Built-in Adapters

Framework Integrations

Citation