GEPA provides seamless integration with DSPy through the DspyAdapter, enabling you to optimize instructions and prompts in your DSPy programs using LLM-guided evolution.
Overview
The DspyAdapter allows GEPA to:
Optimize DSPy predictor instructions
Provide per-predictor feedback for targeted improvements
Support tool-using modules (ReAct)
Capture DSPy traces for reflection
Leverage DSPy’s evaluation framework
Quick Start
import dspy
from gepa.adapters.dspy_adapter import DspyAdapter
from gepa import optimize
# 1. Define your DSPy program
class MathSolver ( dspy . Module ):
def __init__ ( self ):
super (). __init__ ()
self .solve = dspy.ChainOfThought( "question -> answer" )
def forward ( self , question ):
return self .solve( question = question)
# 2. Create metric and feedback functions
def metric ( example , prediction , trace = None ):
return 1.0 if example.answer in prediction.answer else 0.0
def provide_feedback ( predictor_output , predictor_inputs ,
module_inputs , module_outputs , captured_trace ):
correct = module_inputs.answer in predictor_output.get( "answer" , "" )
if correct:
return { "score" : 1.0 , "feedback" : "Correct answer!" }
else :
return {
"score" : 0.0 ,
"feedback" : f "Wrong. Expected: { module_inputs.answer } "
}
# 3. Setup adapter
student = MathSolver()
feedback_map = { "solve" : provide_feedback}
adapter = DspyAdapter(
student_module = student,
metric_fn = metric,
feedback_map = feedback_map,
)
# 4. Optimize
result = optimize(
trainset = train_examples,
valset = val_examples,
adapter = adapter,
reflection_lm = dspy.LM( "openai/gpt-4o" ),
max_metric_calls = 50 ,
)
# 5. Use optimized program
optimized_program = adapter.build_program(result.best_candidate)
DspyAdapter API
Constructor
DspyAdapter(
student_module: dspy.Module,
metric_fn: Callable,
feedback_map: dict[ str , Callable],
failure_score: float = 0.0 ,
num_threads: int | None = None ,
add_format_failure_as_feedback: bool = False ,
rng: random.Random | None = None ,
reflection_lm: dspy. LM | None = None ,
custom_instruction_proposer: ProposalFn | None = None ,
warn_on_score_mismatch: bool = True ,
enable_tool_optimization: bool = False ,
reflection_minibatch_size: int | None = None ,
)
Parameters
student_module
The DSPy program to optimize. class QASystem ( dspy . Module ):
def __init__ ( self ):
super (). __init__ ()
self .retrieve = dspy.Retrieve( k = 3 )
self .generate = dspy.ChainOfThought( "context, question -> answer" )
def forward ( self , question ):
context = self .retrieve(question).passages
return self .generate( context = context, question = question)
student = QASystem()
metric_fn
Program-level metric function. Must return a score (higher is better). def metric ( example , prediction , trace = None ):
"""Simple exact match metric."""
return 1.0 if example.answer == prediction.answer else 0.0
# Multi-objective metric
def advanced_metric ( example , prediction , trace = None ):
exact = 1.0 if example.answer == prediction.answer else 0.0
contains = 1.0 if example.answer in prediction.answer else 0.0
return {
"score" : exact,
"subscores" : {
"exact_match" : exact,
"contains" : contains,
}
}
feedback_map
Mapping from predictor names to feedback functions. Each feedback function provides per-predictor diagnostic information. def generate_feedback ( predictor_output , predictor_inputs ,
module_inputs , module_outputs , captured_trace ):
"""
Args:
predictor_output: Output of this specific predictor
predictor_inputs: Inputs to this specific predictor
module_inputs: Original program inputs (Example)
module_outputs: Final program outputs (Prediction)
captured_trace: Full execution trace
Returns:
dict with 'score' and 'feedback' keys
"""
is_correct = module_inputs.answer in predictor_output.get( "answer" , "" )
if is_correct:
return {
"score" : 1.0 ,
"feedback" : "Generated correct answer."
}
else :
return {
"score" : 0.0 ,
"feedback" : f "Expected ' { module_inputs.answer } ' but got ' { predictor_output.get( 'answer' , '' ) } '."
}
feedback_map = {
"generate" : generate_feedback,
# Add feedback for each predictor you want to optimize
}
Additional Parameters
failure_score : Default score for failed predictions (default: 0.0)
num_threads : Parallel evaluation threads for DSPy’s Evaluate
add_format_failure_as_feedback : Include parsing failures in feedback
reflection_lm : LM for proposing new instructions (defaults to dspy.settings.lm)
enable_tool_optimization : Enable optimization of tool descriptions in ReAct modules
Feedback Functions
Feedback functions are the key to effective DSPy optimization. They provide per-predictor guidance to the reflection LM.
Basic Feedback
def basic_feedback ( predictor_output , predictor_inputs ,
module_inputs , module_outputs , captured_trace ):
correct = module_inputs.expected in predictor_output.get( "answer" , "" )
return {
"score" : 1.0 if correct else 0.0 ,
"feedback" : "Correct!" if correct else f "Expected: { module_inputs.expected } "
}
Detailed Feedback
def detailed_feedback ( predictor_output , predictor_inputs ,
module_inputs , module_outputs , captured_trace ):
answer = predictor_output.get( "answer" , "" )
expected = module_inputs.answer
if expected in answer:
feedback = f "✓ Correct answer found: { expected } "
score = 1.0
else :
feedback = f "✗ Wrong answer. \n Expected: { expected } \n Got: { answer } "
score = 0.0
# Add reasoning quality assessment
reasoning = predictor_output.get( "reasoning" , "" )
if len (reasoning) < 50 :
feedback += " \n Reasoning is too brief. Provide more detailed steps."
return { "score" : score, "feedback" : feedback}
Context-Aware Feedback
def context_aware_feedback ( predictor_output , predictor_inputs ,
module_inputs , module_outputs , captured_trace ):
# Access the full trace to understand context
all_steps = [(p.signature, inputs, outputs)
for p, inputs, outputs in captured_trace]
answer = predictor_output.get( "answer" , "" )
expected = module_inputs.answer
# Check if retrieval provided relevant context
context_str = str (predictor_inputs.get( "context" , "" ))
has_relevant_context = expected in context_str
if expected in answer:
feedback = "Correct answer."
score = 1.0
else :
if not has_relevant_context:
feedback = f "The retrieved context didn't contain the answer ' { expected } '. Consider using different keywords."
else :
feedback = f "The answer ' { expected } ' was in the context but not extracted correctly."
score = 0.0
return { "score" : score, "feedback" : feedback}
Complete Example
Multi-Hop Question Answering
import dspy
from gepa.adapters.dspy_adapter import DspyAdapter
from gepa import optimize
# Configure DSPy
dspy.configure( lm = dspy.LM( "openai/gpt-4o-mini" ))
# Define program
class MultiHopQA ( dspy . Module ):
def __init__ ( self ):
super (). __init__ ()
self .retrieve = dspy.Retrieve( k = 3 )
self .generate_query = dspy.ChainOfThought(
"question -> search_query"
)
self .answer = dspy.ChainOfThought(
"context, question -> answer"
)
def forward ( self , question ):
# First hop: generate search query
query_pred = self .generate_query( question = question)
# Retrieve context
context = self .retrieve(query_pred.search_query).passages
# Second hop: answer based on context
return self .answer( context = context, question = question)
# Metric function
def multihop_metric ( example , prediction , trace = None ):
if example.answer.lower() in prediction.answer.lower():
return 1.0
return 0.0
# Feedback functions for each predictor
def query_feedback ( predictor_output , predictor_inputs ,
module_inputs , module_outputs , captured_trace ):
query = predictor_output.get( "search_query" , "" )
# Check if final answer was correct
final_correct = module_inputs.answer.lower() in module_outputs.answer.lower()
if final_correct:
return {
"score" : 1.0 ,
"feedback" : f "Good query: ' { query } ' led to correct answer."
}
else :
return {
"score" : 0.0 ,
"feedback" : f "Query ' { query } ' didn't retrieve relevant info for answer: { module_inputs.answer } "
}
def answer_feedback ( predictor_output , predictor_inputs ,
module_inputs , module_outputs , captured_trace ):
answer = predictor_output.get( "answer" , "" )
expected = module_inputs.answer
context = str (predictor_inputs.get( "context" , "" ))
if expected.lower() in answer.lower():
return { "score" : 1.0 , "feedback" : "Correct answer extracted." }
elif expected.lower() in context.lower():
return {
"score" : 0.0 ,
"feedback" : f "Answer ' { expected } ' was in context but not extracted."
}
else :
return {
"score" : 0.0 ,
"feedback" : f "Answer ' { expected } ' not in retrieved context."
}
# Create dataset
train_examples = [
dspy.Example(
question = "What is the capital of France?" ,
answer = "Paris"
).with_inputs( "question" ),
# ... more examples
]
val_examples = [
dspy.Example(
question = "Who wrote Romeo and Juliet?" ,
answer = "Shakespeare"
).with_inputs( "question" ),
# ... more examples
]
# Setup and optimize
student = MultiHopQA()
adapter = DspyAdapter(
student_module = student,
metric_fn = multihop_metric,
feedback_map = {
"generate_query" : query_feedback,
"answer" : answer_feedback,
},
num_threads = 4 ,
)
result = optimize(
trainset = train_examples,
valset = val_examples,
adapter = adapter,
reflection_lm = dspy.LM( "openai/gpt-4o" ),
max_metric_calls = 100 ,
)
# Get optimized program
optimized = adapter.build_program(result.best_candidate)
# Test it
test_question = "What is machine learning?"
prediction = optimized( question = test_question)
print (prediction.answer)
GEPA can optimize tool descriptions in ReAct modules:
adapter = DspyAdapter(
student_module = student,
metric_fn = metric,
feedback_map = feedback_map,
enable_tool_optimization = True , # Enable tool optimization
)
Tool optimization improves:
Tool descriptions
Argument descriptions
When to use each tool
Custom Instruction Proposers
You can provide custom logic for proposing new instructions:
from gepa.core.adapter import ProposalFn
def custom_proposer (
candidate : dict[ str , str ],
reflective_dataset : dict[ str , list[ dict ]],
components_to_update : list[ str ]
) -> dict[ str , str ]:
"""
Custom logic to propose improved instructions.
Args:
candidate: Current instruction values
reflective_dataset: Feedback data per component
components_to_update: Which components to update
Returns:
dict mapping component names to new instructions
"""
# Your custom proposal logic here
new_instructions = {}
for comp in components_to_update:
feedback = reflective_dataset[comp]
# Analyze feedback and generate new instruction
new_instructions[comp] = generate_improved_instruction(feedback)
return new_instructions
adapter = DspyAdapter(
student_module = student,
metric_fn = metric,
feedback_map = feedback_map,
custom_instruction_proposer = custom_proposer,
)
Reflective Dataset Structure
The adapter creates reflective examples in this format:
{
"Inputs" : {
"question" : "What is ML?" ,
"context" : "Machine learning is..." ,
},
"Generated Outputs" : {
"answer" : "ML is a subset of AI" ,
"reasoning" : "Based on the context..." ,
},
"Feedback" : "Correct answer. Good reasoning."
}
For format failures:
{
"Inputs" : { ... },
"Generated Outputs" : "Couldn't parse the output..." ,
"Feedback" : "Your output failed to parse. Follow this structure: \n ..."
}
Best Practices
Provide specific, actionable feedback. Generic feedback like “Wrong answer” doesn’t help the LLM improve. Explain why it’s wrong and how to fix it.
Create feedback for all predictors you want to optimize
Use the full trace in feedback functions to provide context
Include expected outputs in feedback when predictions are wrong
Test your metric independently before optimization
Start with a small dataset to iterate quickly
Monitor progress by checking intermediate candidates
Troubleshooting
Score Mismatch Warning
If you see warnings about score mismatches:
adapter = DspyAdapter(
... ,
warn_on_score_mismatch = False , # Disable if using LLM-as-judge
)
This is normal when:
Using non-deterministic metrics (LLM-as-judge)
Providing predictor-specific scores that differ from program-level scores
No Valid Predictions
If you get “No valid predictions found”:
Check your feedback functions return correct format
Enable format failure feedback:
adapter = DspyAdapter( ... , add_format_failure_as_feedback = True )
Verify your program actually calls the predictors you’re optimizing
Next Steps
Adapter System Learn about the adapter architecture
Custom Adapters Create adapters for other frameworks
Evaluation Metrics Design better feedback functions
Configuration Fine-tune optimization parameters