Skip to main content

Overview

The DspyAdapter (full program variant) enables GEPA to evolve complete DSPy programs, not just instructions. This includes:
  • Entire program structure (classes, methods)
  • Module composition and control flow
  • Signature definitions
  • Module interactions
This adapter achieves 93% accuracy on the MATH benchmark (vs 67% with basic DSPy ChainOfThought).

Installation

pip install dspy-ai gepa

Quick Start

import dspy
from gepa.adapters.dspy_full_program_adapter import DspyAdapter
import gepa

# Configure DSPy
dspy.configure(lm=dspy.LM('openai/gpt-4o-mini'))

# Define seed program as code string
seed_program = '''
import dspy

class MyProgram(dspy.Module):
    def __init__(self):
        self.predictor = dspy.ChainOfThought('question -> answer')
    
    def forward(self, question):
        return self.predictor(question=question)

program = MyProgram()
'''

# Define metric
def my_metric(example, prediction, trace=None):
    return float(example.answer == prediction.answer)

# Create adapter
adapter = DspyAdapter(
    task_lm=dspy.LM('openai/gpt-4o-mini'),
    metric_fn=my_metric,
    reflection_lm=dspy.LM('openai/gpt-4')
)

# Optimize
result = gepa.optimize(
    seed_candidate={'program': seed_program},
    trainset=train_data,
    valset=val_data,
    adapter=adapter,
    max_metric_calls=150
)

print('Optimized program:')
print(result.best_candidate['program'])

Class Signature

Defined in src/gepa/adapters/dspy_full_program_adapter/full_program_adapter.py:13:
class DspyAdapter(GEPAAdapter[Example, TraceData, Prediction]):
    def __init__(
        self,
        task_lm: dspy.LM,
        metric_fn: Callable,
        reflection_lm: dspy.LM,
        failure_score=0.0,
        num_threads: int | None = None,
        add_format_failure_as_feedback: bool = False,
        rng: random.Random | None = None,
    )

Parameters

task_lm
dspy.LM
required
Language model for executing the task (running the program).
metric_fn
Callable
required
Evaluation metric. Signature:
def metric(example: Example, prediction: Prediction, trace=None) -> float | dict:
    return score
reflection_lm
dspy.LM
required
Language model for proposing program improvements. Must be provided (cannot be None).
failure_score
float
default:"0.0"
Score assigned when program fails to execute or throws exception.
num_threads
int | None
default:"None"
Number of threads for parallel evaluation. None uses DSPy default.
add_format_failure_as_feedback
bool
default:"False"
Include format failures (parsing errors) in reflective dataset.
rng
random.Random | None
default:"None"
Random number generator for reproducible trace sampling.

Program Format

Programs must be valid Python code strings with:
  1. Required imports: import dspy
  2. Class definition: Define a dspy.Module subclass
  3. Program variable: Assign instance to variable named program

Valid Program Example

program_code = '''
import dspy

class ComplexQA(dspy.Module):
    def __init__(self):
        self.decompose = dspy.ChainOfThought('question -> subquestions')
        self.answer = dspy.ChainOfThought('subquestions -> answer')
    
    def forward(self, question):
        subq = self.decompose(question=question)
        return self.answer(subquestions=subq.subquestions)

program = ComplexQA()
'''

Common Errors

Missing program variable:
# ERROR: No program variable
import dspy
class MyProgram(dspy.Module):
    pass
# Missing: program = MyProgram()
Wrong type:
# ERROR: program is not a dspy.Module instance
import dspy
program = "some string"  # Should be dspy.Module instance

Methods

evaluate()

Evaluates a candidate program on a batch of examples.
def evaluate(
    self,
    batch: list[Example],
    candidate: dict[str, str],
    capture_traces: bool = False,
) -> EvaluationBatch[TraceData, Prediction]
Implementation: src/gepa/adapters/dspy_full_program_adapter/full_program_adapter.py:82

Behavior

  1. Calls build_program() to construct program from code
  2. If program fails to build: Returns failure scores for entire batch with error message in trajectories
  3. If capture_traces=True: Uses bootstrap_trace_data() for detailed traces
  4. If capture_traces=False: Uses dspy.Evaluate() for faster evaluation
  5. Returns EvaluationBatch with outputs, scores, and optional trajectories

build_program()

Compiles program code into executable DSPy module.
def build_program(self, candidate: dict[str, str]) -> tuple[dspy.Module, None] | tuple[None, str]
Implementation: src/gepa/adapters/dspy_full_program_adapter/full_program_adapter.py:35

Returns

  • Success: (dspy.Module instance, None)
  • Failure: (None, error_message)

Validation Steps

  1. Syntax check: Compiles code to check for Python syntax errors
  2. Execution: Runs code to execute class definitions
  3. Program extraction: Checks for program variable in namespace
  4. Type validation: Ensures program is a dspy.Module instance
  5. LM assignment: Sets task_lm on the program

make_reflective_dataset()

Generates reflective dataset from evaluation traces.
def make_reflective_dataset(
    self,
    candidate: dict[str, str],
    eval_batch: EvaluationBatch[TraceData, Prediction],
    components_to_update: list[str],
) -> dict[str, list[dict[str, Any]]]
Implementation: src/gepa/adapters/dspy_full_program_adapter/full_program_adapter.py:130

Behavior

  1. If program failed to build: Returns simple feedback dict with error message
  2. Otherwise: Builds reflective examples with:
    • Program inputs (example fields)
    • Program outputs (prediction fields)
    • Program trace (all predictor calls with inputs/outputs)
    • Feedback (from metric or error messages)
  3. Returns dict with 'program' key mapping to list of examples

propose_new_texts()

Proposes improved program code.
def propose_new_texts(
    self,
    candidate: dict[str, str],
    reflective_dataset: dict[str, list[dict[str, Any]]],
    components_to_update: list[str],
) -> dict[str, str]
Implementation: src/gepa/adapters/dspy_full_program_adapter/full_program_adapter.py:250

Behavior

Uses DSPyProgramProposalSignature to generate new program code based on:
  • Current program code
  • Reflective dataset with failures and successes
  • Feedback from metric

Usage Examples

Basic Math Problem Solving

import dspy
from gepa.adapters.dspy_full_program_adapter import DspyAdapter
import gepa

# Configure
dspy.configure(lm=dspy.LM('openai/gpt-4o-mini'))

# Seed program
seed_program = '''
import dspy

class MathSolver(dspy.Module):
    def __init__(self):
        self.solve = dspy.ChainOfThought('problem -> solution')
    
    def forward(self, problem):
        return self.solve(problem=problem)

program = MathSolver()
'''

# Metric
def math_metric(example, prediction, trace=None):
    # Extract answer from solution
    answer = extract_answer(prediction.solution)
    return float(answer == example.answer)

# Optimize
adapter = DspyAdapter(
    task_lm=dspy.LM('openai/gpt-4o-mini'),
    metric_fn=math_metric,
    reflection_lm=dspy.LM('openai/gpt-4')
)

result = gepa.optimize(
    seed_candidate={'program': seed_program},
    trainset=math_train,
    valset=math_val,
    adapter=adapter,
    max_metric_calls=150
)

# GEPA might evolve this into a multi-step program:
# - Step 1: Parse the problem
# - Step 2: Identify required operations
# - Step 3: Execute calculations
# - Step 4: Verify and format answer

Multi-Hop Question Answering

seed_program = '''
import dspy

class MultiHopQA(dspy.Module):
    def __init__(self):
        self.answer = dspy.ChainOfThought('context, question -> answer')
    
    def forward(self, context, question):
        return self.answer(context=context, question=question)

program = MultiHopQA()
'''

adapter = DspyAdapter(
    task_lm=dspy.LM('openai/gpt-4o-mini'),
    metric_fn=qa_metric,
    reflection_lm=dspy.LM('openai/gpt-4')
)

result = gepa.optimize(
    seed_candidate={'program': seed_program},
    trainset=hotpot_train,
    valset=hotpot_val,
    adapter=adapter,
    max_metric_calls=200
)

# GEPA might discover a multi-hop retrieval strategy:
# - Step 1: Extract entities from question
# - Step 2: Retrieve supporting facts for each entity  
# - Step 3: Chain reasoning across facts
# - Step 4: Generate answer with citations

With Metric Feedback

Provide detailed feedback in your metric:
def detailed_metric(example, prediction, trace=None):
    correct = example.answer == prediction.answer
    
    feedback = ""
    if not correct:
        feedback = f"Expected '{example.answer}', got '{prediction.answer}'. "
        if hasattr(prediction, 'reasoning'):
            feedback += f"Reasoning was: {prediction.reasoning}"
    
    return {
        'score': float(correct),
        'feedback': feedback
    }

adapter = DspyAdapter(
    task_lm=dspy.LM('openai/gpt-4o-mini'),
    metric_fn=detailed_metric,
    reflection_lm=dspy.LM('openai/gpt-4')
)

Reflective Dataset Structure

Each example in the reflective dataset contains:
{
    'Program Inputs': {
        'question': 'What is 2+2?',
        # ... all example input fields
    },
    'Program Outputs': {
        'answer': '4',
        # ... all prediction output fields  
    },
    'Program Trace': [
        {
            'Called Module': 'solve',
            'Inputs': {'problem': 'What is 2+2?'},
            'Generated Outputs': {'solution': 'The answer is 4'}
        },
        # ... more predictor calls
    ],
    'Feedback': 'The answer is correct.' # or error description
}

Error Handling

Syntax Errors

If the proposed program has syntax errors:
# Proposed code with syntax error
program_code = '''
import dspy
class MyProgram(dspy.Module):
    def forward(self, x
        return x  # Missing closing parenthesis
program = MyProgram()
'''

# Adapter returns:
EvaluationBatch(
    outputs=None,
    scores=[0.0] * len(batch),  # Failure scores
    trajectories='Syntax Error in code: ...'
)
The error message is passed to the reflection LM to fix the syntax.

Runtime Errors

If the program executes but throws exceptions:
# Program with runtime error
program_code = '''
import dspy
class MyProgram(dspy.Module):
    def forward(self, x):
        return undefined_var  # NameError
program = MyProgram()
'''

# Adapter captures exception in trajectories
The exception is included in the reflective dataset for the reflection LM to address.

Missing Program Variable

program_code = '''
import dspy
class MyProgram(dspy.Module):
    pass
# Missing: program = MyProgram()
'''

# Returns error:
# "Your code did not define a `program` object..."

Best Practices

1. Start Simple

Begin with a basic program structure:
seed_program = '''
import dspy

class SimpleProgram(dspy.Module):
    def __init__(self):
        self.predictor = dspy.ChainOfThought('input -> output')
    
    def forward(self, input):
        return self.predictor(input=input)

program = SimpleProgram()
'''
Let GEPA evolve complexity as needed.

2. Provide Clear Feedback

Detailed metric feedback helps GEPA understand what to improve:
def metric_with_feedback(example, prediction, trace=None):
    score = compute_score(example, prediction)
    
    feedback_parts = []
    if score < 1.0:
        feedback_parts.append(f"Expected: {example.answer}")
        feedback_parts.append(f"Got: {prediction.answer}")
        if trace:
            feedback_parts.append(f"Trace had {len(trace)} steps")
    
    return {
        'score': score,
        'feedback': ' '.join(feedback_parts)
    }

3. Use Stronger Reflection LM

Program evolution requires sophisticated reasoning:
adapter = DspyAdapter(
    task_lm=dspy.LM('openai/gpt-4o-mini'),  # Cheaper for execution
    reflection_lm=dspy.LM('openai/gpt-4')   # Stronger for program design
)

4. Monitor Program Evolution

Track how programs change:
result = gepa.optimize(
    seed_candidate={'program': seed_program},
    trainset=train_data,
    valset=val_data,
    adapter=adapter,
    max_metric_calls=150,
    log_dir='./gepa_logs'  # Save evolution history
)

# Review intermediate programs
for candidate in result.history:
    print(candidate['program'])
    print('Score:', candidate['score'])
    print('---')

5. Set Appropriate Budget

Program evolution requires more iterations than instruction optimization:
# Instruction optimization: 50-150 calls often sufficient
# Program evolution: 150-500 calls recommended

result = gepa.optimize(
    seed_candidate={'program': seed_program},
    trainset=train_data,
    valset=val_data,
    adapter=adapter,
    max_metric_calls=300  # Higher budget for program evolution
)

Performance Results

From the GEPA paper:
BenchmarkBaselineGEPA Full ProgramImprovement
MATH67%93%+26 pp
HotpotQA45%72%+27 pp
MultiHop38%61%+23 pp

Limitations

  1. Python only: Programs must be valid Python code
  2. Single file: Cannot span multiple files (yet)
  3. Module structure: Must follow DSPy Module pattern
  4. No external dependencies: Cannot import custom libraries (only dspy and standard library)
  5. Higher cost: Program evolution uses more LLM calls than instruction optimization

Advanced Features

Custom DSPy Signatures

GEPA can evolve custom signature definitions:
seed_program = '''
import dspy
from dspy import InputField, OutputField

class CustomSignature(dspy.Signature):
    """Answer math problems step by step."""
    problem = InputField(desc="Math problem to solve")
    reasoning = OutputField(desc="Step-by-step reasoning")
    answer = OutputField(desc="Final numerical answer")

class MathSolver(dspy.Module):
    def __init__(self):
        self.solve = dspy.ChainOfThought(CustomSignature)
    
    def forward(self, problem):
        return self.solve(problem=problem)

program = MathSolver()
'''
GEPA can modify signature docstrings, field descriptions, and even add/remove fields.

Module Composition

GEPA can discover multi-module compositions:
# GEPA might evolve this into:
'''
import dspy

class AdvancedSolver(dspy.Module):
    def __init__(self):
        self.parse = dspy.Predict('problem -> problem_type, variables')
        self.solve = dspy.ChainOfThought('problem_type, variables -> steps')
        self.verify = dspy.Predict('steps, problem -> is_correct, answer')
    
    def forward(self, problem):
        parsed = self.parse(problem=problem)
        steps = self.solve(
            problem_type=parsed.problem_type,
            variables=parsed.variables
        )
        result = self.verify(steps=steps.steps, problem=problem)
        return dspy.Prediction(answer=result.answer)

program = AdvancedSolver()
'''

Control Flow

GEPA can add conditional logic and loops:
# GEPA might evolve this into:
'''
import dspy

class AdaptiveSolver(dspy.Module):
    def __init__(self):
        self.classify = dspy.Predict('problem -> difficulty')
        self.simple_solver = dspy.Predict('problem -> answer')
        self.complex_solver = dspy.ChainOfThought('problem -> answer')
    
    def forward(self, problem):
        difficulty = self.classify(problem=problem).difficulty
        
        if 'hard' in difficulty.lower():
            return self.complex_solver(problem=problem)
        else:
            return self.simple_solver(problem=problem)

program = AdaptiveSolver()
'''

Comparison: Instruction vs Full Program

AspectInstruction OptimizationFull Program Evolution
What’s optimizedSignature instructionsEntire program structure
Program structureFixedCan change
Module countFixedCan increase/decrease
Control flowFixedCan evolve
SpeedFast (50-150 calls)Slower (150-500 calls)
ComplexitySimpleComplex
Use caseRefine existing programsDiscover novel architectures

See Also

Build docs developers (and LLMs) love