DSPy Adapter

Overview

The DspyAdapter enables GEPA to optimize DSPy programs by evolving module signature instructions. This is the adapter used in the official DSPy integration (dspy.GEPA).

The most up-to-date version is maintained in the DSPy repository.

Features

Optimize signature instructions for any DSPy predictor
Support for multiple predictors in complex programs
Tool description optimization for ReAct modules
Custom instruction proposal logic
Multi-objective optimization with subscores
Automatic trace capture and feedback generation

Installation

Install DSPy with GEPA support:

pip install dspy-ai gepa

Quick Start

import dspy
from gepa.adapters.dspy_adapter import DspyAdapter
import gepa

# Define your DSPy program
class MyProgram(dspy.Module):
    def __init__(self):
        self.predictor = dspy.ChainOfThought('question -> answer')
    
    def forward(self, question):
        return self.predictor(question=question)

# Define metric
def my_metric(example, prediction, trace=None):
    return example.answer in prediction.answer

# Create adapter
adapter = DspyAdapter(
    student_module=MyProgram(),
    metric_fn=my_metric,
    feedback_map={'predictor': lambda **kwargs: {'score': 1.0, 'feedback': 'Good'}},
    reflection_lm=dspy.LM('openai/gpt-4')
)

# Optimize
result = gepa.optimize(
    seed_candidate={'predictor': 'Answer the question accurately.'},
    trainset=train_examples,
    valset=val_examples,
    adapter=adapter,
    max_metric_calls=150
)

Class Signature

Defined in src/gepa/adapters/dspy_adapter/dspy_adapter.py:89:

class DspyAdapter(GEPAAdapter[Example, TraceData, Prediction]):
    def __init__(
        self,
        student_module,
        metric_fn: Callable,
        feedback_map: dict[str, Callable],
        failure_score=0.0,
        num_threads: int | None = None,
        add_format_failure_as_feedback: bool = False,
        rng: random.Random | None = None,
        reflection_lm=None,
        custom_instruction_proposer: ProposalFn | None = None,
        warn_on_score_mismatch: bool = True,
        enable_tool_optimization: bool = False,
        reflection_minibatch_size: int | None = None,
    )

Parameters

student_module

dspy.Module

required

The DSPy program to optimize. Should be an instance of dspy.Module.

metric_fn

Callable

required

Evaluation metric. Signature:

def metric(example: Example, prediction: Prediction, trace=None) -> float | dict:
    return score  # or {'score': float, 'subscores': dict}

feedback_map

dict[str, Callable]

required

Maps predictor names to feedback functions. Signature:

def feedback_fn(
    predictor_output: dict[str, Any],
    predictor_inputs: dict[str, Any],
    module_inputs: Example,
    module_outputs: Prediction,
    captured_trace: list
) -> ScoreWithFeedback:
    return ScoreWithFeedback(
        score=1.0,
        feedback='Feedback text',
        subscores={'accuracy': 1.0}
    )

failure_score

float

default:"0.0"

Score assigned when prediction fails or throws exception.

num_threads

int | None

default:"None"

Number of threads for parallel evaluation. None uses DSPy default.

add_format_failure_as_feedback

bool

default:"False"

Include format failures (parsing errors) in reflective dataset.

rng

random.Random | None

default:"None"

Random number generator for reproducible trace sampling.

reflection_lm

dspy.LM | None

default:"None"

Language model for instruction proposal. Uses dspy.settings.lm if None.

custom_instruction_proposer

ProposalFn | None

default:"None"

Override default instruction proposal logic. See Custom Proposers.

warn_on_score_mismatch

bool

default:"True"

Warn when feedback score differs from module score (e.g., LLM-as-judge metrics).

enable_tool_optimization

bool

default:"False"

Enable optimization of tool descriptions in ReAct modules.

reflection_minibatch_size

int | None

default:"None"

Override default minibatch size for reflection. Useful for controlling memory usage.

Data Types

ReflectiveExample

Structure of examples in reflective dataset (src/gepa/adapters/dspy_adapter/dspy_adapter.py:41):

class ReflectiveExample(TypedDict):
    Inputs: dict[str, Any]              # Predictor inputs
    Generated Outputs: dict[str, Any] | str  # Predictor outputs
    Feedback: str                       # Evaluation feedback

ScoreWithFeedback

Feedback function return type (src/gepa/adapters/dspy_adapter/dspy_adapter.py:57):

class ScoreWithFeedback(Prediction):
    score: float
    feedback: str | None = None
    subscores: dict[str, float] | None = None

Methods

evaluate()

Evaluates a candidate program on a batch of examples.

def evaluate(
    self,
    batch: list[Example],
    candidate: dict[str, str],
    capture_traces: bool = False,
) -> EvaluationBatch[TraceData, Prediction]

Implementation: src/gepa/adapters/dspy_adapter/dspy_adapter.py:257

Behavior

Builds program with candidate instructions using build_program()
If capture_traces=True: Uses bootstrap_trace_data() to capture full execution traces
If capture_traces=False: Uses dspy.Evaluate() for faster evaluation
Extracts scores and subscores from metric results
Returns EvaluationBatch with outputs, scores, and optional trajectories

make_reflective_dataset()

Generates reflective dataset from evaluation traces.

def make_reflective_dataset(
    self,
    candidate: dict[str, str],
    eval_batch: EvaluationBatch[TraceData, Prediction],
    components_to_update: list[str],
) -> dict[str, list[ReflectiveExample]]

Implementation: src/gepa/adapters/dspy_adapter/dspy_adapter.py:341

Behavior

For each component in components_to_update:
- Finds all trace instances for that predictor
- Extracts inputs, outputs, and formats them
- Calls corresponding feedback function from feedback_map
- Handles format failures with parsing error messages
Returns dict mapping component names to reflective examples

propose_new_texts()

Proposes new instructions based on reflective dataset.

def propose_new_texts(
    self,
    candidate: dict[str, str],
    reflective_dataset: dict[str, list[dict[str, Any]]],
    components_to_update: list[str],
) -> dict[str, str]

Implementation: src/gepa/adapters/dspy_adapter/dspy_adapter.py:117

Behavior

If custom_instruction_proposer provided: Uses that
Otherwise: Routes to appropriate proposers:
- Regular predictors: Uses InstructionProposalSignature
- Tool modules (ReAct): Uses ToolProposer

build_program()

Constructs a DSPy program from candidate instructions.

def build_program(self, candidate: dict[str, str]) -> dspy.Module

Implementation: src/gepa/adapters/dspy_adapter/dspy_adapter.py:177

Behavior

Deep copies the student module
Updates each predictor’s signature with new instruction
If enable_tool_optimization=True: Updates tool descriptions
Returns modified program

Usage Examples

Basic Optimization

import dspy
from gepa.adapters.dspy_adapter import DspyAdapter
import gepa

# Configure DSPy
dspy.configure(lm=dspy.LM('openai/gpt-4o-mini'))

# Define program
class SimpleQA(dspy.Module):
    def __init__(self):
        self.predictor = dspy.ChainOfThought('question -> answer')
    
    def forward(self, question):
        return self.predictor(question=question)

# Define metric
def exact_match(example, prediction, trace=None):
    return example.answer.lower() == prediction.answer.lower()

# Define feedback function
def qa_feedback(
    predictor_output,
    predictor_inputs,
    module_inputs,
    module_outputs,
    captured_trace
):
    correct = module_inputs.answer.lower() == predictor_output['answer'].lower()
    feedback = f"Answer is {'correct' if correct else 'incorrect'}. Expected: {module_inputs.answer}"
    
    from gepa.adapters.dspy_adapter import ScoreWithFeedback
    return ScoreWithFeedback(
        score=1.0 if correct else 0.0,
        feedback=feedback
    )

# Create adapter
adapter = DspyAdapter(
    student_module=SimpleQA(),
    metric_fn=exact_match,
    feedback_map={'predictor': qa_feedback},
    reflection_lm=dspy.LM('openai/gpt-4')
)

# Optimize
result = gepa.optimize(
    seed_candidate={'predictor': 'Answer the question accurately.'},
    trainset=train_data,
    valset=val_data,
    adapter=adapter,
    max_metric_calls=150
)

Multi-Predictor Program

class MultiStepQA(dspy.Module):
    def __init__(self):
        self.decompose = dspy.ChainOfThought('question -> subquestions')
        self.answer = dspy.ChainOfThought('question, subquestions -> answer')
    
    def forward(self, question):
        subquestions = self.decompose(question=question).subquestions
        return self.answer(question=question, subquestions=subquestions)

# Define feedback for each predictor
def decompose_feedback(**kwargs):
    # Feedback for decomposition step
    return ScoreWithFeedback(score=1.0, feedback='Good decomposition')

def answer_feedback(**kwargs):
    # Feedback for answer step
    correct = kwargs['module_inputs'].answer == kwargs['predictor_output']['answer']
    return ScoreWithFeedback(
        score=1.0 if correct else 0.0,
        feedback=f"Answer is {'correct' if correct else 'incorrect'}"
    )

# Create adapter with multiple feedback functions
adapter = DspyAdapter(
    student_module=MultiStepQA(),
    metric_fn=my_metric,
    feedback_map={
        'decompose': decompose_feedback,
        'answer': answer_feedback
    },
    reflection_lm=dspy.LM('openai/gpt-4')
)

Tool Optimization (ReAct)

import dspy
from dspy.adapters.types.tool import Tool

# Define tools
search_tool = Tool(
    name='search',
    desc='Search the web',
    args={'query': {'type': 'string', 'description': 'Search query'}}
)

class AgentWithTools(dspy.Module):
    def __init__(self):
        self.react = dspy.ReAct(
            'question -> answer',
            tools=[search_tool]
        )
    
    def forward(self, question):
        return self.react(question=question)

# Create adapter with tool optimization
adapter = DspyAdapter(
    student_module=AgentWithTools(),
    metric_fn=my_metric,
    feedback_map={'react': my_feedback},
    reflection_lm=dspy.LM('openai/gpt-4'),
    enable_tool_optimization=True  # Enable tool optimization
)

# Seed candidate with tool description
result = gepa.optimize(
    seed_candidate={
        'react': 'You are a helpful agent with access to tools.',
        'tool_module:react': '{"tools": {"search": {"desc": "Search for information"}}}'
    },
    trainset=train_data,
    valset=val_data,
    adapter=adapter,
    max_metric_calls=150
)

Custom Proposers

Override default instruction proposal logic:

def my_custom_proposer(
    candidate: dict[str, str],
    reflective_dataset: dict[str, list[dict[str, Any]]],
    components_to_update: list[str]
) -> dict[str, str]:
    """Custom instruction proposal logic."""
    new_instructions = {}
    
    for component in components_to_update:
        examples = reflective_dataset[component]
        current_instruction = candidate[component]
        
        # Your custom logic here
        new_instruction = improve_instruction(current_instruction, examples)
        new_instructions[component] = new_instruction
    
    return new_instructions

adapter = DspyAdapter(
    student_module=my_program,
    metric_fn=my_metric,
    feedback_map=my_feedback_map,
    custom_instruction_proposer=my_custom_proposer
)

Multi-Objective Optimization

Return subscores from your metric:

def multi_objective_metric(example, prediction, trace=None):
    correctness = float(example.answer == prediction.answer)
    conciseness = 1.0 / (len(prediction.answer) + 1)
    
    return {
        'score': correctness,  # Primary score
        'subscores': {
            'correctness': correctness,
            'conciseness': conciseness
        }
    }

adapter = DspyAdapter(
    student_module=my_program,
    metric_fn=multi_objective_metric,
    feedback_map=my_feedback_map
)

GEPA will maintain a Pareto front across all subscores.

Best Practices

Feedback Functions: Provide specific, actionable feedback mentioning what went wrong
Module Names: Use descriptive names for predictors (helps in debugging)
Trace Sampling: Set rng for reproducible trace sampling in large programs
Tool Optimization: Only enable if you have ReAct modules with tools
Reflection LM: Use a stronger model (e.g., GPT-4) for reflection than task execution

Integration with dspy.GEPA

The official DSPy integration uses this adapter:

import dspy

# Configure
dspy.configure(lm=dspy.LM('openai/gpt-4o-mini'))

# Use dspy.GEPA (recommended)
optimizer = dspy.GEPA(
    metric=my_metric,
    max_metric_calls=150,
    reflection_lm=dspy.LM('openai/gpt-4')
)

optimized_program = optimizer.compile(
    student=MyProgram(),
    trainset=train_data,
    valset=val_data
)

See DSPy documentation for more details.

Advanced Features

History Handling

The adapter automatically handles History inputs in ReAct modules, formatting them for reflection.

Format Failure Feedback

Enable feedback for parsing errors:

adapter = DspyAdapter(
    student_module=my_program,
    metric_fn=my_metric,
    feedback_map=my_feedback_map,
    add_format_failure_as_feedback=True
)

Score Mismatch Warnings

Disable warnings for non-deterministic metrics:

adapter = DspyAdapter(
    student_module=my_program,
    metric_fn=llm_as_judge_metric,
    feedback_map=my_feedback_map,
    warn_on_score_mismatch=False  # Disable warnings
)

Core API

Adapters

Configuration

Advanced

DSPy Adapter

Overview

Features

Installation

Quick Start

Class Signature

Parameters

Data Types

ReflectiveExample

ScoreWithFeedback

Methods

evaluate()

Behavior

make_reflective_dataset()

Behavior

propose_new_texts()

Behavior

build_program()

Behavior

Usage Examples

Basic Optimization

Multi-Predictor Program

Tool Optimization (ReAct)

Custom Proposers

Multi-Objective Optimization

Best Practices

Integration with dspy.GEPA

Advanced Features

History Handling

Format Failure Feedback

Score Mismatch Warnings

See Also

Build docs developers (and LLMs) love

Core API

Adapters

Configuration

Advanced

​Overview

​Features

​Installation

​Quick Start

​Class Signature

​Parameters

​Data Types

​ReflectiveExample

​ScoreWithFeedback

​Methods

​evaluate()

​Behavior

​make_reflective_dataset()

​Behavior

​propose_new_texts()

​Behavior

​build_program()

​Behavior

​Usage Examples

​Basic Optimization

​Multi-Predictor Program

​Tool Optimization (ReAct)

​Custom Proposers

​Multi-Objective Optimization

​Best Practices

​Integration with dspy.GEPA

​Advanced Features

​History Handling

​Format Failure Feedback

​Score Mismatch Warnings

​See Also

Build docs developers (and LLMs) love

Overview

Features

Installation

Quick Start

Class Signature

Parameters

Data Types

ReflectiveExample

ScoreWithFeedback

Methods

evaluate()

Behavior

make_reflective_dataset()

Behavior

propose_new_texts()

Behavior

build_program()

Behavior

Usage Examples

Basic Optimization

Multi-Predictor Program

Tool Optimization (ReAct)

Custom Proposers

Multi-Objective Optimization

Best Practices

Integration with dspy.GEPA

Advanced Features

History Handling

Format Failure Feedback

Score Mismatch Warnings

See Also