Overview
The DspyAdapter (full program variant) enables GEPA to evolve complete DSPy programs, not just instructions. This includes:
- Entire program structure (classes, methods)
- Module composition and control flow
- Signature definitions
- Module interactions
This adapter achieves 93% accuracy on the MATH benchmark (vs 67% with basic DSPy ChainOfThought).
Installation
Quick Start
import dspy
from gepa.adapters.dspy_full_program_adapter import DspyAdapter
import gepa
# Configure DSPy
dspy.configure(lm=dspy.LM('openai/gpt-4o-mini'))
# Define seed program as code string
seed_program = '''
import dspy
class MyProgram(dspy.Module):
def __init__(self):
self.predictor = dspy.ChainOfThought('question -> answer')
def forward(self, question):
return self.predictor(question=question)
program = MyProgram()
'''
# Define metric
def my_metric(example, prediction, trace=None):
return float(example.answer == prediction.answer)
# Create adapter
adapter = DspyAdapter(
task_lm=dspy.LM('openai/gpt-4o-mini'),
metric_fn=my_metric,
reflection_lm=dspy.LM('openai/gpt-4')
)
# Optimize
result = gepa.optimize(
seed_candidate={'program': seed_program},
trainset=train_data,
valset=val_data,
adapter=adapter,
max_metric_calls=150
)
print('Optimized program:')
print(result.best_candidate['program'])
Class Signature
Defined in src/gepa/adapters/dspy_full_program_adapter/full_program_adapter.py:13:
class DspyAdapter(GEPAAdapter[Example, TraceData, Prediction]):
def __init__(
self,
task_lm: dspy.LM,
metric_fn: Callable,
reflection_lm: dspy.LM,
failure_score=0.0,
num_threads: int | None = None,
add_format_failure_as_feedback: bool = False,
rng: random.Random | None = None,
)
Parameters
Language model for executing the task (running the program).
Evaluation metric. Signature:def metric(example: Example, prediction: Prediction, trace=None) -> float | dict:
return score
Language model for proposing program improvements. Must be provided (cannot be None).
Score assigned when program fails to execute or throws exception.
Number of threads for parallel evaluation. None uses DSPy default.
add_format_failure_as_feedback
Include format failures (parsing errors) in reflective dataset.
rng
random.Random | None
default:"None"
Random number generator for reproducible trace sampling.
Programs must be valid Python code strings with:
- Required imports:
import dspy
- Class definition: Define a
dspy.Module subclass
- Program variable: Assign instance to variable named
program
Valid Program Example
program_code = '''
import dspy
class ComplexQA(dspy.Module):
def __init__(self):
self.decompose = dspy.ChainOfThought('question -> subquestions')
self.answer = dspy.ChainOfThought('subquestions -> answer')
def forward(self, question):
subq = self.decompose(question=question)
return self.answer(subquestions=subq.subquestions)
program = ComplexQA()
'''
Common Errors
Missing program variable:# ERROR: No program variable
import dspy
class MyProgram(dspy.Module):
pass
# Missing: program = MyProgram()
Wrong type:# ERROR: program is not a dspy.Module instance
import dspy
program = "some string" # Should be dspy.Module instance
Methods
evaluate()
Evaluates a candidate program on a batch of examples.
def evaluate(
self,
batch: list[Example],
candidate: dict[str, str],
capture_traces: bool = False,
) -> EvaluationBatch[TraceData, Prediction]
Implementation: src/gepa/adapters/dspy_full_program_adapter/full_program_adapter.py:82
Behavior
- Calls
build_program() to construct program from code
- If program fails to build: Returns failure scores for entire batch with error message in trajectories
- If
capture_traces=True: Uses bootstrap_trace_data() for detailed traces
- If
capture_traces=False: Uses dspy.Evaluate() for faster evaluation
- Returns
EvaluationBatch with outputs, scores, and optional trajectories
build_program()
Compiles program code into executable DSPy module.
def build_program(self, candidate: dict[str, str]) -> tuple[dspy.Module, None] | tuple[None, str]
Implementation: src/gepa/adapters/dspy_full_program_adapter/full_program_adapter.py:35
Returns
- Success:
(dspy.Module instance, None)
- Failure:
(None, error_message)
Validation Steps
- Syntax check: Compiles code to check for Python syntax errors
- Execution: Runs code to execute class definitions
- Program extraction: Checks for
program variable in namespace
- Type validation: Ensures
program is a dspy.Module instance
- LM assignment: Sets
task_lm on the program
make_reflective_dataset()
Generates reflective dataset from evaluation traces.
def make_reflective_dataset(
self,
candidate: dict[str, str],
eval_batch: EvaluationBatch[TraceData, Prediction],
components_to_update: list[str],
) -> dict[str, list[dict[str, Any]]]
Implementation: src/gepa/adapters/dspy_full_program_adapter/full_program_adapter.py:130
Behavior
- If program failed to build: Returns simple feedback dict with error message
- Otherwise: Builds reflective examples with:
- Program inputs (example fields)
- Program outputs (prediction fields)
- Program trace (all predictor calls with inputs/outputs)
- Feedback (from metric or error messages)
- Returns dict with
'program' key mapping to list of examples
propose_new_texts()
Proposes improved program code.
def propose_new_texts(
self,
candidate: dict[str, str],
reflective_dataset: dict[str, list[dict[str, Any]]],
components_to_update: list[str],
) -> dict[str, str]
Implementation: src/gepa/adapters/dspy_full_program_adapter/full_program_adapter.py:250
Behavior
Uses DSPyProgramProposalSignature to generate new program code based on:
- Current program code
- Reflective dataset with failures and successes
- Feedback from metric
Usage Examples
Basic Math Problem Solving
import dspy
from gepa.adapters.dspy_full_program_adapter import DspyAdapter
import gepa
# Configure
dspy.configure(lm=dspy.LM('openai/gpt-4o-mini'))
# Seed program
seed_program = '''
import dspy
class MathSolver(dspy.Module):
def __init__(self):
self.solve = dspy.ChainOfThought('problem -> solution')
def forward(self, problem):
return self.solve(problem=problem)
program = MathSolver()
'''
# Metric
def math_metric(example, prediction, trace=None):
# Extract answer from solution
answer = extract_answer(prediction.solution)
return float(answer == example.answer)
# Optimize
adapter = DspyAdapter(
task_lm=dspy.LM('openai/gpt-4o-mini'),
metric_fn=math_metric,
reflection_lm=dspy.LM('openai/gpt-4')
)
result = gepa.optimize(
seed_candidate={'program': seed_program},
trainset=math_train,
valset=math_val,
adapter=adapter,
max_metric_calls=150
)
# GEPA might evolve this into a multi-step program:
# - Step 1: Parse the problem
# - Step 2: Identify required operations
# - Step 3: Execute calculations
# - Step 4: Verify and format answer
Multi-Hop Question Answering
seed_program = '''
import dspy
class MultiHopQA(dspy.Module):
def __init__(self):
self.answer = dspy.ChainOfThought('context, question -> answer')
def forward(self, context, question):
return self.answer(context=context, question=question)
program = MultiHopQA()
'''
adapter = DspyAdapter(
task_lm=dspy.LM('openai/gpt-4o-mini'),
metric_fn=qa_metric,
reflection_lm=dspy.LM('openai/gpt-4')
)
result = gepa.optimize(
seed_candidate={'program': seed_program},
trainset=hotpot_train,
valset=hotpot_val,
adapter=adapter,
max_metric_calls=200
)
# GEPA might discover a multi-hop retrieval strategy:
# - Step 1: Extract entities from question
# - Step 2: Retrieve supporting facts for each entity
# - Step 3: Chain reasoning across facts
# - Step 4: Generate answer with citations
With Metric Feedback
Provide detailed feedback in your metric:
def detailed_metric(example, prediction, trace=None):
correct = example.answer == prediction.answer
feedback = ""
if not correct:
feedback = f"Expected '{example.answer}', got '{prediction.answer}'. "
if hasattr(prediction, 'reasoning'):
feedback += f"Reasoning was: {prediction.reasoning}"
return {
'score': float(correct),
'feedback': feedback
}
adapter = DspyAdapter(
task_lm=dspy.LM('openai/gpt-4o-mini'),
metric_fn=detailed_metric,
reflection_lm=dspy.LM('openai/gpt-4')
)
Reflective Dataset Structure
Each example in the reflective dataset contains:
{
'Program Inputs': {
'question': 'What is 2+2?',
# ... all example input fields
},
'Program Outputs': {
'answer': '4',
# ... all prediction output fields
},
'Program Trace': [
{
'Called Module': 'solve',
'Inputs': {'problem': 'What is 2+2?'},
'Generated Outputs': {'solution': 'The answer is 4'}
},
# ... more predictor calls
],
'Feedback': 'The answer is correct.' # or error description
}
Error Handling
Syntax Errors
If the proposed program has syntax errors:
# Proposed code with syntax error
program_code = '''
import dspy
class MyProgram(dspy.Module):
def forward(self, x
return x # Missing closing parenthesis
program = MyProgram()
'''
# Adapter returns:
EvaluationBatch(
outputs=None,
scores=[0.0] * len(batch), # Failure scores
trajectories='Syntax Error in code: ...'
)
The error message is passed to the reflection LM to fix the syntax.
Runtime Errors
If the program executes but throws exceptions:
# Program with runtime error
program_code = '''
import dspy
class MyProgram(dspy.Module):
def forward(self, x):
return undefined_var # NameError
program = MyProgram()
'''
# Adapter captures exception in trajectories
The exception is included in the reflective dataset for the reflection LM to address.
Missing Program Variable
program_code = '''
import dspy
class MyProgram(dspy.Module):
pass
# Missing: program = MyProgram()
'''
# Returns error:
# "Your code did not define a `program` object..."
Best Practices
1. Start Simple
Begin with a basic program structure:
seed_program = '''
import dspy
class SimpleProgram(dspy.Module):
def __init__(self):
self.predictor = dspy.ChainOfThought('input -> output')
def forward(self, input):
return self.predictor(input=input)
program = SimpleProgram()
'''
Let GEPA evolve complexity as needed.
2. Provide Clear Feedback
Detailed metric feedback helps GEPA understand what to improve:
def metric_with_feedback(example, prediction, trace=None):
score = compute_score(example, prediction)
feedback_parts = []
if score < 1.0:
feedback_parts.append(f"Expected: {example.answer}")
feedback_parts.append(f"Got: {prediction.answer}")
if trace:
feedback_parts.append(f"Trace had {len(trace)} steps")
return {
'score': score,
'feedback': ' '.join(feedback_parts)
}
3. Use Stronger Reflection LM
Program evolution requires sophisticated reasoning:
adapter = DspyAdapter(
task_lm=dspy.LM('openai/gpt-4o-mini'), # Cheaper for execution
reflection_lm=dspy.LM('openai/gpt-4') # Stronger for program design
)
4. Monitor Program Evolution
Track how programs change:
result = gepa.optimize(
seed_candidate={'program': seed_program},
trainset=train_data,
valset=val_data,
adapter=adapter,
max_metric_calls=150,
log_dir='./gepa_logs' # Save evolution history
)
# Review intermediate programs
for candidate in result.history:
print(candidate['program'])
print('Score:', candidate['score'])
print('---')
5. Set Appropriate Budget
Program evolution requires more iterations than instruction optimization:
# Instruction optimization: 50-150 calls often sufficient
# Program evolution: 150-500 calls recommended
result = gepa.optimize(
seed_candidate={'program': seed_program},
trainset=train_data,
valset=val_data,
adapter=adapter,
max_metric_calls=300 # Higher budget for program evolution
)
From the GEPA paper:
| Benchmark | Baseline | GEPA Full Program | Improvement |
|---|
| MATH | 67% | 93% | +26 pp |
| HotpotQA | 45% | 72% | +27 pp |
| MultiHop | 38% | 61% | +23 pp |
Limitations
- Python only: Programs must be valid Python code
- Single file: Cannot span multiple files (yet)
- Module structure: Must follow DSPy
Module pattern
- No external dependencies: Cannot import custom libraries (only
dspy and standard library)
- Higher cost: Program evolution uses more LLM calls than instruction optimization
Advanced Features
Custom DSPy Signatures
GEPA can evolve custom signature definitions:
seed_program = '''
import dspy
from dspy import InputField, OutputField
class CustomSignature(dspy.Signature):
"""Answer math problems step by step."""
problem = InputField(desc="Math problem to solve")
reasoning = OutputField(desc="Step-by-step reasoning")
answer = OutputField(desc="Final numerical answer")
class MathSolver(dspy.Module):
def __init__(self):
self.solve = dspy.ChainOfThought(CustomSignature)
def forward(self, problem):
return self.solve(problem=problem)
program = MathSolver()
'''
GEPA can modify signature docstrings, field descriptions, and even add/remove fields.
Module Composition
GEPA can discover multi-module compositions:
# GEPA might evolve this into:
'''
import dspy
class AdvancedSolver(dspy.Module):
def __init__(self):
self.parse = dspy.Predict('problem -> problem_type, variables')
self.solve = dspy.ChainOfThought('problem_type, variables -> steps')
self.verify = dspy.Predict('steps, problem -> is_correct, answer')
def forward(self, problem):
parsed = self.parse(problem=problem)
steps = self.solve(
problem_type=parsed.problem_type,
variables=parsed.variables
)
result = self.verify(steps=steps.steps, problem=problem)
return dspy.Prediction(answer=result.answer)
program = AdvancedSolver()
'''
Control Flow
GEPA can add conditional logic and loops:
# GEPA might evolve this into:
'''
import dspy
class AdaptiveSolver(dspy.Module):
def __init__(self):
self.classify = dspy.Predict('problem -> difficulty')
self.simple_solver = dspy.Predict('problem -> answer')
self.complex_solver = dspy.ChainOfThought('problem -> answer')
def forward(self, problem):
difficulty = self.classify(problem=problem).difficulty
if 'hard' in difficulty.lower():
return self.complex_solver(problem=problem)
else:
return self.simple_solver(problem=problem)
program = AdaptiveSolver()
'''
Comparison: Instruction vs Full Program
| Aspect | Instruction Optimization | Full Program Evolution |
|---|
| What’s optimized | Signature instructions | Entire program structure |
| Program structure | Fixed | Can change |
| Module count | Fixed | Can increase/decrease |
| Control flow | Fixed | Can evolve |
| Speed | Fast (50-150 calls) | Slower (150-500 calls) |
| Complexity | Simple | Complex |
| Use case | Refine existing programs | Discover novel architectures |
See Also