Skip to main content

Overview

The DspyAdapter enables GEPA to optimize DSPy programs by evolving module signature instructions. This is the adapter used in the official DSPy integration (dspy.GEPA).
The most up-to-date version is maintained in the DSPy repository.

Features

  • Optimize signature instructions for any DSPy predictor
  • Support for multiple predictors in complex programs
  • Tool description optimization for ReAct modules
  • Custom instruction proposal logic
  • Multi-objective optimization with subscores
  • Automatic trace capture and feedback generation

Installation

Install DSPy with GEPA support:
pip install dspy-ai gepa

Quick Start

import dspy
from gepa.adapters.dspy_adapter import DspyAdapter
import gepa

# Define your DSPy program
class MyProgram(dspy.Module):
    def __init__(self):
        self.predictor = dspy.ChainOfThought('question -> answer')
    
    def forward(self, question):
        return self.predictor(question=question)

# Define metric
def my_metric(example, prediction, trace=None):
    return example.answer in prediction.answer

# Create adapter
adapter = DspyAdapter(
    student_module=MyProgram(),
    metric_fn=my_metric,
    feedback_map={'predictor': lambda **kwargs: {'score': 1.0, 'feedback': 'Good'}},
    reflection_lm=dspy.LM('openai/gpt-4')
)

# Optimize
result = gepa.optimize(
    seed_candidate={'predictor': 'Answer the question accurately.'},
    trainset=train_examples,
    valset=val_examples,
    adapter=adapter,
    max_metric_calls=150
)

Class Signature

Defined in src/gepa/adapters/dspy_adapter/dspy_adapter.py:89:
class DspyAdapter(GEPAAdapter[Example, TraceData, Prediction]):
    def __init__(
        self,
        student_module,
        metric_fn: Callable,
        feedback_map: dict[str, Callable],
        failure_score=0.0,
        num_threads: int | None = None,
        add_format_failure_as_feedback: bool = False,
        rng: random.Random | None = None,
        reflection_lm=None,
        custom_instruction_proposer: ProposalFn | None = None,
        warn_on_score_mismatch: bool = True,
        enable_tool_optimization: bool = False,
        reflection_minibatch_size: int | None = None,
    )

Parameters

student_module
dspy.Module
required
The DSPy program to optimize. Should be an instance of dspy.Module.
metric_fn
Callable
required
Evaluation metric. Signature:
def metric(example: Example, prediction: Prediction, trace=None) -> float | dict:
    return score  # or {'score': float, 'subscores': dict}
feedback_map
dict[str, Callable]
required
Maps predictor names to feedback functions. Signature:
def feedback_fn(
    predictor_output: dict[str, Any],
    predictor_inputs: dict[str, Any],
    module_inputs: Example,
    module_outputs: Prediction,
    captured_trace: list
) -> ScoreWithFeedback:
    return ScoreWithFeedback(
        score=1.0,
        feedback='Feedback text',
        subscores={'accuracy': 1.0}
    )
failure_score
float
default:"0.0"
Score assigned when prediction fails or throws exception.
num_threads
int | None
default:"None"
Number of threads for parallel evaluation. None uses DSPy default.
add_format_failure_as_feedback
bool
default:"False"
Include format failures (parsing errors) in reflective dataset.
rng
random.Random | None
default:"None"
Random number generator for reproducible trace sampling.
reflection_lm
dspy.LM | None
default:"None"
Language model for instruction proposal. Uses dspy.settings.lm if None.
custom_instruction_proposer
ProposalFn | None
default:"None"
Override default instruction proposal logic. See Custom Proposers.
warn_on_score_mismatch
bool
default:"True"
Warn when feedback score differs from module score (e.g., LLM-as-judge metrics).
enable_tool_optimization
bool
default:"False"
Enable optimization of tool descriptions in ReAct modules.
reflection_minibatch_size
int | None
default:"None"
Override default minibatch size for reflection. Useful for controlling memory usage.

Data Types

ReflectiveExample

Structure of examples in reflective dataset (src/gepa/adapters/dspy_adapter/dspy_adapter.py:41):
class ReflectiveExample(TypedDict):
    Inputs: dict[str, Any]              # Predictor inputs
    Generated Outputs: dict[str, Any] | str  # Predictor outputs
    Feedback: str                       # Evaluation feedback

ScoreWithFeedback

Feedback function return type (src/gepa/adapters/dspy_adapter/dspy_adapter.py:57):
class ScoreWithFeedback(Prediction):
    score: float
    feedback: str | None = None
    subscores: dict[str, float] | None = None

Methods

evaluate()

Evaluates a candidate program on a batch of examples.
def evaluate(
    self,
    batch: list[Example],
    candidate: dict[str, str],
    capture_traces: bool = False,
) -> EvaluationBatch[TraceData, Prediction]
Implementation: src/gepa/adapters/dspy_adapter/dspy_adapter.py:257

Behavior

  1. Builds program with candidate instructions using build_program()
  2. If capture_traces=True: Uses bootstrap_trace_data() to capture full execution traces
  3. If capture_traces=False: Uses dspy.Evaluate() for faster evaluation
  4. Extracts scores and subscores from metric results
  5. Returns EvaluationBatch with outputs, scores, and optional trajectories

make_reflective_dataset()

Generates reflective dataset from evaluation traces.
def make_reflective_dataset(
    self,
    candidate: dict[str, str],
    eval_batch: EvaluationBatch[TraceData, Prediction],
    components_to_update: list[str],
) -> dict[str, list[ReflectiveExample]]
Implementation: src/gepa/adapters/dspy_adapter/dspy_adapter.py:341

Behavior

  1. For each component in components_to_update:
    • Finds all trace instances for that predictor
    • Extracts inputs, outputs, and formats them
    • Calls corresponding feedback function from feedback_map
    • Handles format failures with parsing error messages
  2. Returns dict mapping component names to reflective examples

propose_new_texts()

Proposes new instructions based on reflective dataset.
def propose_new_texts(
    self,
    candidate: dict[str, str],
    reflective_dataset: dict[str, list[dict[str, Any]]],
    components_to_update: list[str],
) -> dict[str, str]
Implementation: src/gepa/adapters/dspy_adapter/dspy_adapter.py:117

Behavior

  • If custom_instruction_proposer provided: Uses that
  • Otherwise: Routes to appropriate proposers:
    • Regular predictors: Uses InstructionProposalSignature
    • Tool modules (ReAct): Uses ToolProposer

build_program()

Constructs a DSPy program from candidate instructions.
def build_program(self, candidate: dict[str, str]) -> dspy.Module
Implementation: src/gepa/adapters/dspy_adapter/dspy_adapter.py:177

Behavior

  1. Deep copies the student module
  2. Updates each predictor’s signature with new instruction
  3. If enable_tool_optimization=True: Updates tool descriptions
  4. Returns modified program

Usage Examples

Basic Optimization

import dspy
from gepa.adapters.dspy_adapter import DspyAdapter
import gepa

# Configure DSPy
dspy.configure(lm=dspy.LM('openai/gpt-4o-mini'))

# Define program
class SimpleQA(dspy.Module):
    def __init__(self):
        self.predictor = dspy.ChainOfThought('question -> answer')
    
    def forward(self, question):
        return self.predictor(question=question)

# Define metric
def exact_match(example, prediction, trace=None):
    return example.answer.lower() == prediction.answer.lower()

# Define feedback function
def qa_feedback(
    predictor_output,
    predictor_inputs,
    module_inputs,
    module_outputs,
    captured_trace
):
    correct = module_inputs.answer.lower() == predictor_output['answer'].lower()
    feedback = f"Answer is {'correct' if correct else 'incorrect'}. Expected: {module_inputs.answer}"
    
    from gepa.adapters.dspy_adapter import ScoreWithFeedback
    return ScoreWithFeedback(
        score=1.0 if correct else 0.0,
        feedback=feedback
    )

# Create adapter
adapter = DspyAdapter(
    student_module=SimpleQA(),
    metric_fn=exact_match,
    feedback_map={'predictor': qa_feedback},
    reflection_lm=dspy.LM('openai/gpt-4')
)

# Optimize
result = gepa.optimize(
    seed_candidate={'predictor': 'Answer the question accurately.'},
    trainset=train_data,
    valset=val_data,
    adapter=adapter,
    max_metric_calls=150
)

Multi-Predictor Program

class MultiStepQA(dspy.Module):
    def __init__(self):
        self.decompose = dspy.ChainOfThought('question -> subquestions')
        self.answer = dspy.ChainOfThought('question, subquestions -> answer')
    
    def forward(self, question):
        subquestions = self.decompose(question=question).subquestions
        return self.answer(question=question, subquestions=subquestions)

# Define feedback for each predictor
def decompose_feedback(**kwargs):
    # Feedback for decomposition step
    return ScoreWithFeedback(score=1.0, feedback='Good decomposition')

def answer_feedback(**kwargs):
    # Feedback for answer step
    correct = kwargs['module_inputs'].answer == kwargs['predictor_output']['answer']
    return ScoreWithFeedback(
        score=1.0 if correct else 0.0,
        feedback=f"Answer is {'correct' if correct else 'incorrect'}"
    )

# Create adapter with multiple feedback functions
adapter = DspyAdapter(
    student_module=MultiStepQA(),
    metric_fn=my_metric,
    feedback_map={
        'decompose': decompose_feedback,
        'answer': answer_feedback
    },
    reflection_lm=dspy.LM('openai/gpt-4')
)

Tool Optimization (ReAct)

import dspy
from dspy.adapters.types.tool import Tool

# Define tools
search_tool = Tool(
    name='search',
    desc='Search the web',
    args={'query': {'type': 'string', 'description': 'Search query'}}
)

class AgentWithTools(dspy.Module):
    def __init__(self):
        self.react = dspy.ReAct(
            'question -> answer',
            tools=[search_tool]
        )
    
    def forward(self, question):
        return self.react(question=question)

# Create adapter with tool optimization
adapter = DspyAdapter(
    student_module=AgentWithTools(),
    metric_fn=my_metric,
    feedback_map={'react': my_feedback},
    reflection_lm=dspy.LM('openai/gpt-4'),
    enable_tool_optimization=True  # Enable tool optimization
)

# Seed candidate with tool description
result = gepa.optimize(
    seed_candidate={
        'react': 'You are a helpful agent with access to tools.',
        'tool_module:react': '{"tools": {"search": {"desc": "Search for information"}}}'
    },
    trainset=train_data,
    valset=val_data,
    adapter=adapter,
    max_metric_calls=150
)

Custom Proposers

Override default instruction proposal logic:
def my_custom_proposer(
    candidate: dict[str, str],
    reflective_dataset: dict[str, list[dict[str, Any]]],
    components_to_update: list[str]
) -> dict[str, str]:
    """Custom instruction proposal logic."""
    new_instructions = {}
    
    for component in components_to_update:
        examples = reflective_dataset[component]
        current_instruction = candidate[component]
        
        # Your custom logic here
        new_instruction = improve_instruction(current_instruction, examples)
        new_instructions[component] = new_instruction
    
    return new_instructions

adapter = DspyAdapter(
    student_module=my_program,
    metric_fn=my_metric,
    feedback_map=my_feedback_map,
    custom_instruction_proposer=my_custom_proposer
)

Multi-Objective Optimization

Return subscores from your metric:
def multi_objective_metric(example, prediction, trace=None):
    correctness = float(example.answer == prediction.answer)
    conciseness = 1.0 / (len(prediction.answer) + 1)
    
    return {
        'score': correctness,  # Primary score
        'subscores': {
            'correctness': correctness,
            'conciseness': conciseness
        }
    }

adapter = DspyAdapter(
    student_module=my_program,
    metric_fn=multi_objective_metric,
    feedback_map=my_feedback_map
)
GEPA will maintain a Pareto front across all subscores.

Best Practices

  1. Feedback Functions: Provide specific, actionable feedback mentioning what went wrong
  2. Module Names: Use descriptive names for predictors (helps in debugging)
  3. Trace Sampling: Set rng for reproducible trace sampling in large programs
  4. Tool Optimization: Only enable if you have ReAct modules with tools
  5. Reflection LM: Use a stronger model (e.g., GPT-4) for reflection than task execution

Integration with dspy.GEPA

The official DSPy integration uses this adapter:
import dspy

# Configure
dspy.configure(lm=dspy.LM('openai/gpt-4o-mini'))

# Use dspy.GEPA (recommended)
optimizer = dspy.GEPA(
    metric=my_metric,
    max_metric_calls=150,
    reflection_lm=dspy.LM('openai/gpt-4')
)

optimized_program = optimizer.compile(
    student=MyProgram(),
    trainset=train_data,
    valset=val_data
)
See DSPy documentation for more details.

Advanced Features

History Handling

The adapter automatically handles History inputs in ReAct modules, formatting them for reflection.

Format Failure Feedback

Enable feedback for parsing errors:
adapter = DspyAdapter(
    student_module=my_program,
    metric_fn=my_metric,
    feedback_map=my_feedback_map,
    add_format_failure_as_feedback=True
)

Score Mismatch Warnings

Disable warnings for non-deterministic metrics:
adapter = DspyAdapter(
    student_module=my_program,
    metric_fn=llm_as_judge_metric,
    feedback_map=my_feedback_map,
    warn_on_score_mismatch=False  # Disable warnings
)

See Also

Build docs developers (and LLMs) love