Overview
The DspyAdapter enables GEPA to optimize DSPy programs by evolving module signature instructions. This is the adapter used in the official DSPy integration (dspy.GEPA).
Features
- Optimize signature instructions for any DSPy predictor
- Support for multiple predictors in complex programs
- Tool description optimization for ReAct modules
- Custom instruction proposal logic
- Multi-objective optimization with subscores
- Automatic trace capture and feedback generation
Installation
Install DSPy with GEPA support:
Quick Start
import dspy
from gepa.adapters.dspy_adapter import DspyAdapter
import gepa
# Define your DSPy program
class MyProgram(dspy.Module):
def __init__(self):
self.predictor = dspy.ChainOfThought('question -> answer')
def forward(self, question):
return self.predictor(question=question)
# Define metric
def my_metric(example, prediction, trace=None):
return example.answer in prediction.answer
# Create adapter
adapter = DspyAdapter(
student_module=MyProgram(),
metric_fn=my_metric,
feedback_map={'predictor': lambda **kwargs: {'score': 1.0, 'feedback': 'Good'}},
reflection_lm=dspy.LM('openai/gpt-4')
)
# Optimize
result = gepa.optimize(
seed_candidate={'predictor': 'Answer the question accurately.'},
trainset=train_examples,
valset=val_examples,
adapter=adapter,
max_metric_calls=150
)
Class Signature
Defined in src/gepa/adapters/dspy_adapter/dspy_adapter.py:89:
class DspyAdapter(GEPAAdapter[Example, TraceData, Prediction]):
def __init__(
self,
student_module,
metric_fn: Callable,
feedback_map: dict[str, Callable],
failure_score=0.0,
num_threads: int | None = None,
add_format_failure_as_feedback: bool = False,
rng: random.Random | None = None,
reflection_lm=None,
custom_instruction_proposer: ProposalFn | None = None,
warn_on_score_mismatch: bool = True,
enable_tool_optimization: bool = False,
reflection_minibatch_size: int | None = None,
)
Parameters
The DSPy program to optimize. Should be an instance of dspy.Module.
Evaluation metric. Signature:def metric(example: Example, prediction: Prediction, trace=None) -> float | dict:
return score # or {'score': float, 'subscores': dict}
feedback_map
dict[str, Callable]
required
Maps predictor names to feedback functions. Signature:def feedback_fn(
predictor_output: dict[str, Any],
predictor_inputs: dict[str, Any],
module_inputs: Example,
module_outputs: Prediction,
captured_trace: list
) -> ScoreWithFeedback:
return ScoreWithFeedback(
score=1.0,
feedback='Feedback text',
subscores={'accuracy': 1.0}
)
Score assigned when prediction fails or throws exception.
Number of threads for parallel evaluation. None uses DSPy default.
add_format_failure_as_feedback
Include format failures (parsing errors) in reflective dataset.
rng
random.Random | None
default:"None"
Random number generator for reproducible trace sampling.
reflection_lm
dspy.LM | None
default:"None"
Language model for instruction proposal. Uses dspy.settings.lm if None.
custom_instruction_proposer
ProposalFn | None
default:"None"
Warn when feedback score differs from module score (e.g., LLM-as-judge metrics).
Enable optimization of tool descriptions in ReAct modules.
reflection_minibatch_size
Override default minibatch size for reflection. Useful for controlling memory usage.
Data Types
ReflectiveExample
Structure of examples in reflective dataset (src/gepa/adapters/dspy_adapter/dspy_adapter.py:41):
class ReflectiveExample(TypedDict):
Inputs: dict[str, Any] # Predictor inputs
Generated Outputs: dict[str, Any] | str # Predictor outputs
Feedback: str # Evaluation feedback
ScoreWithFeedback
Feedback function return type (src/gepa/adapters/dspy_adapter/dspy_adapter.py:57):
class ScoreWithFeedback(Prediction):
score: float
feedback: str | None = None
subscores: dict[str, float] | None = None
Methods
evaluate()
Evaluates a candidate program on a batch of examples.
def evaluate(
self,
batch: list[Example],
candidate: dict[str, str],
capture_traces: bool = False,
) -> EvaluationBatch[TraceData, Prediction]
Implementation: src/gepa/adapters/dspy_adapter/dspy_adapter.py:257
Behavior
- Builds program with candidate instructions using
build_program()
- If
capture_traces=True: Uses bootstrap_trace_data() to capture full execution traces
- If
capture_traces=False: Uses dspy.Evaluate() for faster evaluation
- Extracts scores and subscores from metric results
- Returns
EvaluationBatch with outputs, scores, and optional trajectories
make_reflective_dataset()
Generates reflective dataset from evaluation traces.
def make_reflective_dataset(
self,
candidate: dict[str, str],
eval_batch: EvaluationBatch[TraceData, Prediction],
components_to_update: list[str],
) -> dict[str, list[ReflectiveExample]]
Implementation: src/gepa/adapters/dspy_adapter/dspy_adapter.py:341
Behavior
- For each component in
components_to_update:
- Finds all trace instances for that predictor
- Extracts inputs, outputs, and formats them
- Calls corresponding feedback function from
feedback_map
- Handles format failures with parsing error messages
- Returns dict mapping component names to reflective examples
propose_new_texts()
Proposes new instructions based on reflective dataset.
def propose_new_texts(
self,
candidate: dict[str, str],
reflective_dataset: dict[str, list[dict[str, Any]]],
components_to_update: list[str],
) -> dict[str, str]
Implementation: src/gepa/adapters/dspy_adapter/dspy_adapter.py:117
Behavior
- If
custom_instruction_proposer provided: Uses that
- Otherwise: Routes to appropriate proposers:
- Regular predictors: Uses
InstructionProposalSignature
- Tool modules (ReAct): Uses
ToolProposer
build_program()
Constructs a DSPy program from candidate instructions.
def build_program(self, candidate: dict[str, str]) -> dspy.Module
Implementation: src/gepa/adapters/dspy_adapter/dspy_adapter.py:177
Behavior
- Deep copies the student module
- Updates each predictor’s signature with new instruction
- If
enable_tool_optimization=True: Updates tool descriptions
- Returns modified program
Usage Examples
Basic Optimization
import dspy
from gepa.adapters.dspy_adapter import DspyAdapter
import gepa
# Configure DSPy
dspy.configure(lm=dspy.LM('openai/gpt-4o-mini'))
# Define program
class SimpleQA(dspy.Module):
def __init__(self):
self.predictor = dspy.ChainOfThought('question -> answer')
def forward(self, question):
return self.predictor(question=question)
# Define metric
def exact_match(example, prediction, trace=None):
return example.answer.lower() == prediction.answer.lower()
# Define feedback function
def qa_feedback(
predictor_output,
predictor_inputs,
module_inputs,
module_outputs,
captured_trace
):
correct = module_inputs.answer.lower() == predictor_output['answer'].lower()
feedback = f"Answer is {'correct' if correct else 'incorrect'}. Expected: {module_inputs.answer}"
from gepa.adapters.dspy_adapter import ScoreWithFeedback
return ScoreWithFeedback(
score=1.0 if correct else 0.0,
feedback=feedback
)
# Create adapter
adapter = DspyAdapter(
student_module=SimpleQA(),
metric_fn=exact_match,
feedback_map={'predictor': qa_feedback},
reflection_lm=dspy.LM('openai/gpt-4')
)
# Optimize
result = gepa.optimize(
seed_candidate={'predictor': 'Answer the question accurately.'},
trainset=train_data,
valset=val_data,
adapter=adapter,
max_metric_calls=150
)
Multi-Predictor Program
class MultiStepQA(dspy.Module):
def __init__(self):
self.decompose = dspy.ChainOfThought('question -> subquestions')
self.answer = dspy.ChainOfThought('question, subquestions -> answer')
def forward(self, question):
subquestions = self.decompose(question=question).subquestions
return self.answer(question=question, subquestions=subquestions)
# Define feedback for each predictor
def decompose_feedback(**kwargs):
# Feedback for decomposition step
return ScoreWithFeedback(score=1.0, feedback='Good decomposition')
def answer_feedback(**kwargs):
# Feedback for answer step
correct = kwargs['module_inputs'].answer == kwargs['predictor_output']['answer']
return ScoreWithFeedback(
score=1.0 if correct else 0.0,
feedback=f"Answer is {'correct' if correct else 'incorrect'}"
)
# Create adapter with multiple feedback functions
adapter = DspyAdapter(
student_module=MultiStepQA(),
metric_fn=my_metric,
feedback_map={
'decompose': decompose_feedback,
'answer': answer_feedback
},
reflection_lm=dspy.LM('openai/gpt-4')
)
import dspy
from dspy.adapters.types.tool import Tool
# Define tools
search_tool = Tool(
name='search',
desc='Search the web',
args={'query': {'type': 'string', 'description': 'Search query'}}
)
class AgentWithTools(dspy.Module):
def __init__(self):
self.react = dspy.ReAct(
'question -> answer',
tools=[search_tool]
)
def forward(self, question):
return self.react(question=question)
# Create adapter with tool optimization
adapter = DspyAdapter(
student_module=AgentWithTools(),
metric_fn=my_metric,
feedback_map={'react': my_feedback},
reflection_lm=dspy.LM('openai/gpt-4'),
enable_tool_optimization=True # Enable tool optimization
)
# Seed candidate with tool description
result = gepa.optimize(
seed_candidate={
'react': 'You are a helpful agent with access to tools.',
'tool_module:react': '{"tools": {"search": {"desc": "Search for information"}}}'
},
trainset=train_data,
valset=val_data,
adapter=adapter,
max_metric_calls=150
)
Custom Proposers
Override default instruction proposal logic:
def my_custom_proposer(
candidate: dict[str, str],
reflective_dataset: dict[str, list[dict[str, Any]]],
components_to_update: list[str]
) -> dict[str, str]:
"""Custom instruction proposal logic."""
new_instructions = {}
for component in components_to_update:
examples = reflective_dataset[component]
current_instruction = candidate[component]
# Your custom logic here
new_instruction = improve_instruction(current_instruction, examples)
new_instructions[component] = new_instruction
return new_instructions
adapter = DspyAdapter(
student_module=my_program,
metric_fn=my_metric,
feedback_map=my_feedback_map,
custom_instruction_proposer=my_custom_proposer
)
Multi-Objective Optimization
Return subscores from your metric:
def multi_objective_metric(example, prediction, trace=None):
correctness = float(example.answer == prediction.answer)
conciseness = 1.0 / (len(prediction.answer) + 1)
return {
'score': correctness, # Primary score
'subscores': {
'correctness': correctness,
'conciseness': conciseness
}
}
adapter = DspyAdapter(
student_module=my_program,
metric_fn=multi_objective_metric,
feedback_map=my_feedback_map
)
GEPA will maintain a Pareto front across all subscores.
Best Practices
- Feedback Functions: Provide specific, actionable feedback mentioning what went wrong
- Module Names: Use descriptive names for predictors (helps in debugging)
- Trace Sampling: Set
rng for reproducible trace sampling in large programs
- Tool Optimization: Only enable if you have ReAct modules with tools
- Reflection LM: Use a stronger model (e.g., GPT-4) for reflection than task execution
Integration with dspy.GEPA
The official DSPy integration uses this adapter:
import dspy
# Configure
dspy.configure(lm=dspy.LM('openai/gpt-4o-mini'))
# Use dspy.GEPA (recommended)
optimizer = dspy.GEPA(
metric=my_metric,
max_metric_calls=150,
reflection_lm=dspy.LM('openai/gpt-4')
)
optimized_program = optimizer.compile(
student=MyProgram(),
trainset=train_data,
valset=val_data
)
See DSPy documentation for more details.
Advanced Features
History Handling
The adapter automatically handles History inputs in ReAct modules, formatting them for reflection.
Enable feedback for parsing errors:
adapter = DspyAdapter(
student_module=my_program,
metric_fn=my_metric,
feedback_map=my_feedback_map,
add_format_failure_as_feedback=True
)
Score Mismatch Warnings
Disable warnings for non-deterministic metrics:
adapter = DspyAdapter(
student_module=my_program,
metric_fn=llm_as_judge_metric,
feedback_map=my_feedback_map,
warn_on_score_mismatch=False # Disable warnings
)
See Also