DSPy Full Program Adapter

Overview

The DspyAdapter (full program variant) enables GEPA to evolve complete DSPy programs, not just instructions. This includes:

Entire program structure (classes, methods)
Module composition and control flow
Signature definitions
Module interactions

This adapter achieves 93% accuracy on the MATH benchmark (vs 67% with basic DSPy ChainOfThought).

Installation

pip install dspy-ai gepa

Quick Start

import dspy
from gepa.adapters.dspy_full_program_adapter import DspyAdapter
import gepa

# Configure DSPy
dspy.configure(lm=dspy.LM('openai/gpt-4o-mini'))

# Define seed program as code string
seed_program = '''
import dspy

class MyProgram(dspy.Module):
    def __init__(self):
        self.predictor = dspy.ChainOfThought('question -> answer')
    
    def forward(self, question):
        return self.predictor(question=question)

program = MyProgram()
'''

# Define metric
def my_metric(example, prediction, trace=None):
    return float(example.answer == prediction.answer)

# Create adapter
adapter = DspyAdapter(
    task_lm=dspy.LM('openai/gpt-4o-mini'),
    metric_fn=my_metric,
    reflection_lm=dspy.LM('openai/gpt-4')
)

# Optimize
result = gepa.optimize(
    seed_candidate={'program': seed_program},
    trainset=train_data,
    valset=val_data,
    adapter=adapter,
    max_metric_calls=150
)

print('Optimized program:')
print(result.best_candidate['program'])

Class Signature

Defined in src/gepa/adapters/dspy_full_program_adapter/full_program_adapter.py:13:

class DspyAdapter(GEPAAdapter[Example, TraceData, Prediction]):
    def __init__(
        self,
        task_lm: dspy.LM,
        metric_fn: Callable,
        reflection_lm: dspy.LM,
        failure_score=0.0,
        num_threads: int | None = None,
        add_format_failure_as_feedback: bool = False,
        rng: random.Random | None = None,
    )

Parameters

task_lm

dspy.LM

required

Language model for executing the task (running the program).

metric_fn

Callable

required

Evaluation metric. Signature:

def metric(example: Example, prediction: Prediction, trace=None) -> float | dict:
    return score

reflection_lm

dspy.LM

required

Language model for proposing program improvements. Must be provided (cannot be None).

failure_score

float

default:"0.0"

Score assigned when program fails to execute or throws exception.

num_threads

int | None

default:"None"

Number of threads for parallel evaluation. None uses DSPy default.

add_format_failure_as_feedback

bool

default:"False"

Include format failures (parsing errors) in reflective dataset.

rng

random.Random | None

default:"None"

Random number generator for reproducible trace sampling.

Program Format

Programs must be valid Python code strings with:

Required imports: import dspy
Class definition: Define a dspy.Module subclass
Program variable: Assign instance to variable named program

Valid Program Example

program_code = '''
import dspy

class ComplexQA(dspy.Module):
    def __init__(self):
        self.decompose = dspy.ChainOfThought('question -> subquestions')
        self.answer = dspy.ChainOfThought('subquestions -> answer')
    
    def forward(self, question):
        subq = self.decompose(question=question)
        return self.answer(subquestions=subq.subquestions)

program = ComplexQA()
'''

Common Errors

Missing program variable:

# ERROR: No program variable
import dspy
class MyProgram(dspy.Module):
    pass
# Missing: program = MyProgram()

Wrong type:

# ERROR: program is not a dspy.Module instance
import dspy
program = "some string"  # Should be dspy.Module instance

Methods

evaluate()

Evaluates a candidate program on a batch of examples.

def evaluate(
    self,
    batch: list[Example],
    candidate: dict[str, str],
    capture_traces: bool = False,
) -> EvaluationBatch[TraceData, Prediction]

Implementation: src/gepa/adapters/dspy_full_program_adapter/full_program_adapter.py:82

Behavior

Calls build_program() to construct program from code
If program fails to build: Returns failure scores for entire batch with error message in trajectories
If capture_traces=True: Uses bootstrap_trace_data() for detailed traces
If capture_traces=False: Uses dspy.Evaluate() for faster evaluation
Returns EvaluationBatch with outputs, scores, and optional trajectories

build_program()

Compiles program code into executable DSPy module.

def build_program(self, candidate: dict[str, str]) -> tuple[dspy.Module, None] | tuple[None, str]

Implementation: src/gepa/adapters/dspy_full_program_adapter/full_program_adapter.py:35

Returns

Success: (dspy.Module instance, None)
Failure: (None, error_message)

Validation Steps

Syntax check: Compiles code to check for Python syntax errors
Execution: Runs code to execute class definitions
Program extraction: Checks for program variable in namespace
Type validation: Ensures program is a dspy.Module instance
LM assignment: Sets task_lm on the program

make_reflective_dataset()

Generates reflective dataset from evaluation traces.

def make_reflective_dataset(
    self,
    candidate: dict[str, str],
    eval_batch: EvaluationBatch[TraceData, Prediction],
    components_to_update: list[str],
) -> dict[str, list[dict[str, Any]]]

Implementation: src/gepa/adapters/dspy_full_program_adapter/full_program_adapter.py:130

Behavior

If program failed to build: Returns simple feedback dict with error message
Otherwise: Builds reflective examples with:
- Program inputs (example fields)
- Program outputs (prediction fields)
- Program trace (all predictor calls with inputs/outputs)
- Feedback (from metric or error messages)
Returns dict with 'program' key mapping to list of examples

propose_new_texts()

Proposes improved program code.

def propose_new_texts(
    self,
    candidate: dict[str, str],
    reflective_dataset: dict[str, list[dict[str, Any]]],
    components_to_update: list[str],
) -> dict[str, str]

Implementation: src/gepa/adapters/dspy_full_program_adapter/full_program_adapter.py:250

Behavior

Uses DSPyProgramProposalSignature to generate new program code based on:

Current program code
Reflective dataset with failures and successes
Feedback from metric

Usage Examples

Basic Math Problem Solving

import dspy
from gepa.adapters.dspy_full_program_adapter import DspyAdapter
import gepa

# Configure
dspy.configure(lm=dspy.LM('openai/gpt-4o-mini'))

# Seed program
seed_program = '''
import dspy

class MathSolver(dspy.Module):
    def __init__(self):
        self.solve = dspy.ChainOfThought('problem -> solution')
    
    def forward(self, problem):
        return self.solve(problem=problem)

program = MathSolver()
'''

# Metric
def math_metric(example, prediction, trace=None):
    # Extract answer from solution
    answer = extract_answer(prediction.solution)
    return float(answer == example.answer)

# Optimize
adapter = DspyAdapter(
    task_lm=dspy.LM('openai/gpt-4o-mini'),
    metric_fn=math_metric,
    reflection_lm=dspy.LM('openai/gpt-4')
)

result = gepa.optimize(
    seed_candidate={'program': seed_program},
    trainset=math_train,
    valset=math_val,
    adapter=adapter,
    max_metric_calls=150
)

# GEPA might evolve this into a multi-step program:
# - Step 1: Parse the problem
# - Step 2: Identify required operations
# - Step 3: Execute calculations
# - Step 4: Verify and format answer

Multi-Hop Question Answering

seed_program = '''
import dspy

class MultiHopQA(dspy.Module):
    def __init__(self):
        self.answer = dspy.ChainOfThought('context, question -> answer')
    
    def forward(self, context, question):
        return self.answer(context=context, question=question)

program = MultiHopQA()
'''

adapter = DspyAdapter(
    task_lm=dspy.LM('openai/gpt-4o-mini'),
    metric_fn=qa_metric,
    reflection_lm=dspy.LM('openai/gpt-4')
)

result = gepa.optimize(
    seed_candidate={'program': seed_program},
    trainset=hotpot_train,
    valset=hotpot_val,
    adapter=adapter,
    max_metric_calls=200
)

# GEPA might discover a multi-hop retrieval strategy:
# - Step 1: Extract entities from question
# - Step 2: Retrieve supporting facts for each entity  
# - Step 3: Chain reasoning across facts
# - Step 4: Generate answer with citations

With Metric Feedback

Provide detailed feedback in your metric:

def detailed_metric(example, prediction, trace=None):
    correct = example.answer == prediction.answer
    
    feedback = ""
    if not correct:
        feedback = f"Expected '{example.answer}', got '{prediction.answer}'. "
        if hasattr(prediction, 'reasoning'):
            feedback += f"Reasoning was: {prediction.reasoning}"
    
    return {
        'score': float(correct),
        'feedback': feedback
    }

adapter = DspyAdapter(
    task_lm=dspy.LM('openai/gpt-4o-mini'),
    metric_fn=detailed_metric,
    reflection_lm=dspy.LM('openai/gpt-4')
)

Reflective Dataset Structure

Each example in the reflective dataset contains:

{
    'Program Inputs': {
        'question': 'What is 2+2?',
        # ... all example input fields
    },
    'Program Outputs': {
        'answer': '4',
        # ... all prediction output fields  
    },
    'Program Trace': [
        {
            'Called Module': 'solve',
            'Inputs': {'problem': 'What is 2+2?'},
            'Generated Outputs': {'solution': 'The answer is 4'}
        },
        # ... more predictor calls
    ],
    'Feedback': 'The answer is correct.' # or error description
}

Error Handling

Syntax Errors

If the proposed program has syntax errors:

# Proposed code with syntax error
program_code = '''
import dspy
class MyProgram(dspy.Module):
    def forward(self, x
        return x  # Missing closing parenthesis
program = MyProgram()
'''

# Adapter returns:
EvaluationBatch(
    outputs=None,
    scores=[0.0] * len(batch),  # Failure scores
    trajectories='Syntax Error in code: ...'
)

The error message is passed to the reflection LM to fix the syntax.

Runtime Errors

If the program executes but throws exceptions:

# Program with runtime error
program_code = '''
import dspy
class MyProgram(dspy.Module):
    def forward(self, x):
        return undefined_var  # NameError
program = MyProgram()
'''

# Adapter captures exception in trajectories

The exception is included in the reflective dataset for the reflection LM to address.

Missing Program Variable

program_code = '''
import dspy
class MyProgram(dspy.Module):
    pass
# Missing: program = MyProgram()
'''

# Returns error:
# "Your code did not define a `program` object..."

Best Practices

1. Start Simple

Begin with a basic program structure:

seed_program = '''
import dspy

class SimpleProgram(dspy.Module):
    def __init__(self):
        self.predictor = dspy.ChainOfThought('input -> output')
    
    def forward(self, input):
        return self.predictor(input=input)

program = SimpleProgram()
'''

Let GEPA evolve complexity as needed.

2. Provide Clear Feedback

Detailed metric feedback helps GEPA understand what to improve:

def metric_with_feedback(example, prediction, trace=None):
    score = compute_score(example, prediction)
    
    feedback_parts = []
    if score < 1.0:
        feedback_parts.append(f"Expected: {example.answer}")
        feedback_parts.append(f"Got: {prediction.answer}")
        if trace:
            feedback_parts.append(f"Trace had {len(trace)} steps")
    
    return {
        'score': score,
        'feedback': ' '.join(feedback_parts)
    }

3. Use Stronger Reflection LM

Program evolution requires sophisticated reasoning:

adapter = DspyAdapter(
    task_lm=dspy.LM('openai/gpt-4o-mini'),  # Cheaper for execution
    reflection_lm=dspy.LM('openai/gpt-4')   # Stronger for program design
)

4. Monitor Program Evolution

Track how programs change:

result = gepa.optimize(
    seed_candidate={'program': seed_program},
    trainset=train_data,
    valset=val_data,
    adapter=adapter,
    max_metric_calls=150,
    log_dir='./gepa_logs'  # Save evolution history
)

# Review intermediate programs
for candidate in result.history:
    print(candidate['program'])
    print('Score:', candidate['score'])
    print('---')

5. Set Appropriate Budget

Program evolution requires more iterations than instruction optimization:

# Instruction optimization: 50-150 calls often sufficient
# Program evolution: 150-500 calls recommended

result = gepa.optimize(
    seed_candidate={'program': seed_program},
    trainset=train_data,
    valset=val_data,
    adapter=adapter,
    max_metric_calls=300  # Higher budget for program evolution
)

Performance Results

From the GEPA paper:

Benchmark	Baseline	GEPA Full Program	Improvement
MATH	67%	93%	+26 pp
HotpotQA	45%	72%	+27 pp
MultiHop	38%	61%	+23 pp

Limitations

Python only: Programs must be valid Python code
Single file: Cannot span multiple files (yet)
Module structure: Must follow DSPy Module pattern
No external dependencies: Cannot import custom libraries (only dspy and standard library)
Higher cost: Program evolution uses more LLM calls than instruction optimization

Advanced Features

Custom DSPy Signatures

GEPA can evolve custom signature definitions:

seed_program = '''
import dspy
from dspy import InputField, OutputField

class CustomSignature(dspy.Signature):
    """Answer math problems step by step."""
    problem = InputField(desc="Math problem to solve")
    reasoning = OutputField(desc="Step-by-step reasoning")
    answer = OutputField(desc="Final numerical answer")

class MathSolver(dspy.Module):
    def __init__(self):
        self.solve = dspy.ChainOfThought(CustomSignature)
    
    def forward(self, problem):
        return self.solve(problem=problem)

program = MathSolver()
'''

GEPA can modify signature docstrings, field descriptions, and even add/remove fields.

Module Composition

GEPA can discover multi-module compositions:

# GEPA might evolve this into:
'''
import dspy

class AdvancedSolver(dspy.Module):
    def __init__(self):
        self.parse = dspy.Predict('problem -> problem_type, variables')
        self.solve = dspy.ChainOfThought('problem_type, variables -> steps')
        self.verify = dspy.Predict('steps, problem -> is_correct, answer')
    
    def forward(self, problem):
        parsed = self.parse(problem=problem)
        steps = self.solve(
            problem_type=parsed.problem_type,
            variables=parsed.variables
        )
        result = self.verify(steps=steps.steps, problem=problem)
        return dspy.Prediction(answer=result.answer)

program = AdvancedSolver()
'''

Control Flow

GEPA can add conditional logic and loops:

# GEPA might evolve this into:
'''
import dspy

class AdaptiveSolver(dspy.Module):
    def __init__(self):
        self.classify = dspy.Predict('problem -> difficulty')
        self.simple_solver = dspy.Predict('problem -> answer')
        self.complex_solver = dspy.ChainOfThought('problem -> answer')
    
    def forward(self, problem):
        difficulty = self.classify(problem=problem).difficulty
        
        if 'hard' in difficulty.lower():
            return self.complex_solver(problem=problem)
        else:
            return self.simple_solver(problem=problem)

program = AdaptiveSolver()
'''

Comparison: Instruction vs Full Program

Aspect	Instruction Optimization	Full Program Evolution
What’s optimized	Signature instructions	Entire program structure
Program structure	Fixed	Can change
Module count	Fixed	Can increase/decrease
Control flow	Fixed	Can evolve
Speed	Fast (50-150 calls)	Slower (150-500 calls)
Complexity	Simple	Complex
Use case	Refine existing programs	Discover novel architectures

Core API

Adapters

Configuration

Advanced

​Overview

​Installation

​Quick Start

​Class Signature

​Parameters

​Program Format

​Valid Program Example

​Common Errors

​Methods

​evaluate()

​Behavior

​build_program()

​Returns

​Validation Steps

​make_reflective_dataset()

​Behavior

​propose_new_texts()

​Behavior

​Usage Examples

​Basic Math Problem Solving

​Multi-Hop Question Answering

​With Metric Feedback

​Reflective Dataset Structure

​Error Handling

​Syntax Errors

​Runtime Errors

​Missing Program Variable

​Best Practices

​1. Start Simple

​2. Provide Clear Feedback

​3. Use Stronger Reflection LM

​4. Monitor Program Evolution

​5. Set Appropriate Budget

​Performance Results

​Limitations

​Advanced Features

​Custom DSPy Signatures

​Module Composition

​Control Flow

​Comparison: Instruction vs Full Program

​See Also

Build docs developers (and LLMs) love

Overview

Installation

Quick Start

Class Signature

Parameters

Program Format

Valid Program Example

Common Errors

Methods

evaluate()

Behavior

build_program()

Returns

Validation Steps

make_reflective_dataset()

Behavior

propose_new_texts()

Behavior

Usage Examples

Basic Math Problem Solving

Multi-Hop Question Answering

With Metric Feedback

Reflective Dataset Structure

Error Handling

Syntax Errors

Runtime Errors

Missing Program Variable

Best Practices

1. Start Simple

2. Provide Clear Feedback

3. Use Stronger Reflection LM

4. Monitor Program Evolution

5. Set Appropriate Budget

Performance Results

Limitations

Advanced Features

Custom DSPy Signatures

Module Composition

Control Flow

Comparison: Instruction vs Full Program

See Also