Skip to main content

Overview

The DefaultAdapter provides a simple way to optimize system prompts for single-turn LLM tasks. It’s ideal for scenarios where:
  • You have a question-answer format
  • The task can be completed in one turn
  • You want to optimize the system prompt
  • You can use any LiteLLM-compatible model or custom callable

Installation

The DefaultAdapter is included with GEPA. No additional dependencies required.

Quick Start

import gepa
from gepa.adapters.default_adapter import DefaultAdapter

# Prepare dataset
train_data = [
    {
        'input': 'What is 2+2?',
        'answer': '4',
        'additional_context': {}
    },
    # ... more examples
]

# Create adapter with OpenAI model
adapter = DefaultAdapter(
    model='openai/gpt-4o-mini',
    max_litellm_workers=10
)

# Optimize
result = gepa.optimize(
    seed_candidate={'system_prompt': 'You are a helpful assistant.'},
    trainset=train_data[:50],
    valset=train_data[50:],
    adapter=adapter,
    max_metric_calls=150,
    reflection_lm='openai/gpt-4'
)

print('Optimized prompt:', result.best_candidate['system_prompt'])

Class Signature

Defined in src/gepa/adapters/default_adapter/default_adapter.py:87:
class DefaultAdapter(GEPAAdapter[DefaultDataInst, DefaultTrajectory, DefaultRolloutOutput]):
    def __init__(
        self,
        model: str | ChatCompletionCallable,
        evaluator: Evaluator | None = None,
        max_litellm_workers: int = 10,
        litellm_batch_completion_kwargs: dict[str, Any] | None = None,
    )

Parameters

model
str | ChatCompletionCallable
required
Model for task execution. Can be:
  • LiteLLM model string (e.g., 'openai/gpt-4o-mini')
  • Custom callable accepting list[ChatMessage] and returning str
evaluator
Evaluator | None
default:"ContainsAnswerEvaluator()"
Custom evaluator function. Signature:
def evaluator(data: DefaultDataInst, response: str) -> EvaluationResult:
    return EvaluationResult(
        score=1.0,
        feedback='Good answer',
        objective_scores=None
    )
Default evaluator checks if data['answer'] is contained in response.
max_litellm_workers
int
default:"10"
Maximum parallel workers for LiteLLM batch completions.
litellm_batch_completion_kwargs
dict[str, Any] | None
default:"None"
Additional kwargs passed to litellm.batch_completion() (temperature, max_tokens, etc.).

Data Types

DefaultDataInst

Input data structure (src/gepa/adapters/default_adapter/default_adapter.py:11):
class DefaultDataInst(TypedDict):
    input: str                          # User query or question
    additional_context: dict[str, str]  # Extra context for feedback
    answer: str                         # Expected answer

DefaultTrajectory

Execution trace (src/gepa/adapters/default_adapter/default_adapter.py:23):
class DefaultTrajectory(TypedDict):
    data: DefaultDataInst              # Original data instance
    full_assistant_response: str       # Model's complete response
    feedback: str                      # Evaluation feedback

DefaultRolloutOutput

Final output (src/gepa/adapters/default_adapter/default_adapter.py:29):
class DefaultRolloutOutput(TypedDict):
    full_assistant_response: str       # Model's response

Methods

evaluate()

Evaluates a candidate system prompt on a batch of examples.
def evaluate(
    self,
    batch: list[DefaultDataInst],
    candidate: dict[str, str],
    capture_traces: bool = False,
) -> EvaluationBatch[DefaultTrajectory, DefaultRolloutOutput]
Implementation: src/gepa/adapters/default_adapter/default_adapter.py:104

Parameters

  • batch: List of data instances to evaluate
  • candidate: Dictionary with system prompt (uses first value)
  • capture_traces: Whether to capture detailed trajectories

Returns

EvaluationBatch containing:
  • outputs: List of DefaultRolloutOutput
  • scores: Per-example scores (1.0 for correct, 0.0 for incorrect by default)
  • trajectories: List of DefaultTrajectory if capture_traces=True
  • objective_scores: Optional multi-objective scores

make_reflective_dataset()

Generates reflective dataset for prompt improvement.
def make_reflective_dataset(
    self,
    candidate: dict[str, str],
    eval_batch: EvaluationBatch[DefaultTrajectory, DefaultRolloutOutput],
    components_to_update: list[str],
) -> Mapping[str, Sequence[Mapping[str, Any]]]
Implementation: src/gepa/adapters/default_adapter/default_adapter.py:179

Returns

Dictionary mapping component names to reflective examples:
{
    'system_prompt': [
        {
            'Inputs': 'What is 2+2?',
            'Generated Outputs': 'The answer is 4',
            'Feedback': 'The generated response is correct. The response includes the correct answer "4"'
        },
        # ... more examples
    ]
}

Custom Evaluators

Default: ContainsAnswerEvaluator

Checks if expected answer is in the response (src/gepa/adapters/default_adapter/default_adapter.py:63):
class ContainsAnswerEvaluator:
    def __init__(self, failure_score: float = 0.0):
        self.failure_score = failure_score
    
    def __call__(self, data: DefaultDataInst, response: str) -> EvaluationResult:
        is_correct = data['answer'] in response
        score = 1.0 if is_correct else self.failure_score
        
        if is_correct:
            feedback = f"The generated response is correct. The response includes the correct answer '{data['answer']}'"
        else:
            feedback = f"The generated response is incorrect. The correct answer is '{data['answer']}'. Ensure that the correct answer is included in the response exactly as it is."
        
        return EvaluationResult(score=score, feedback=feedback)

Custom Evaluator Example

Create a semantic similarity evaluator:
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity

class SemanticEvaluator:
    def __init__(self):
        self.model = SentenceTransformer('all-MiniLM-L6-v2')
    
    def __call__(self, data, response):
        # Compute embeddings
        ref_emb = self.model.encode([data['answer']])
        resp_emb = self.model.encode([response])
        
        # Calculate similarity
        similarity = cosine_similarity(ref_emb, resp_emb)[0][0]
        
        feedback = f"Semantic similarity: {similarity:.2f}"
        if similarity < 0.7:
            feedback += f" - Expected answer: {data['answer']}"
        
        return EvaluationResult(
            score=float(similarity),
            feedback=feedback
        )

# Use custom evaluator
adapter = DefaultAdapter(
    model='openai/gpt-4o-mini',
    evaluator=SemanticEvaluator()
)

Custom Model Callable

Use a custom model instead of LiteLLM:
def my_custom_model(messages: list[dict]) -> str:
    # Your custom inference logic
    system_prompt = messages[0]['content']
    user_query = messages[1]['content']
    
    # Call your model
    response = my_model.generate(system_prompt, user_query)
    return response

adapter = DefaultAdapter(
    model=my_custom_model,
    evaluator=ContainsAnswerEvaluator()
)
The callable must:
  • Accept list[ChatMessage] where ChatMessage = TypedDict('ChatMessage', {'role': str, 'content': str})
  • Return str (the model’s response)

Multi-Objective Optimization

Return multiple scores from your evaluator:
class MultiObjectiveEvaluator:
    def __call__(self, data, response):
        # Compute multiple metrics
        correctness = 1.0 if data['answer'] in response else 0.0
        brevity = 1.0 / (len(response) + 1)  # Shorter is better
        clarity = compute_clarity_score(response)  # Your metric
        
        return EvaluationResult(
            score=correctness,  # Primary score
            feedback='Feedback message',
            objective_scores={
                'correctness': correctness,
                'brevity': brevity,
                'clarity': clarity
            }
        )

adapter = DefaultAdapter(
    model='openai/gpt-4o-mini',
    evaluator=MultiObjectiveEvaluator()
)
GEPA will maintain a Pareto front across all objectives.

Examples

AIME Math Problems

From the GEPA AIME tutorial:
import gepa
from gepa.adapters.default_adapter import DefaultAdapter

# Load AIME dataset
trainset, valset, testset = gepa.examples.aime.init_dataset()

# Create adapter
adapter = DefaultAdapter(
    model='openai/gpt-4o-mini',
    max_litellm_workers=10
)

# Optimize
result = gepa.optimize(
    seed_candidate={
        'system_prompt': 'You are a helpful assistant. Answer the question. '
                        'Put your final answer in the format "### <answer>"'
    },
    trainset=trainset,
    valset=valset,
    adapter=adapter,
    max_metric_calls=150,
    reflection_lm='openai/gpt-4'
)

print('Optimized prompt:', result.best_candidate['system_prompt'])
print('Validation score:', result.best_score)
Result: GPT-4 Mini goes from 46.6% → 56.6% on AIME 2025.

Question Answering with Context

train_data = [
    {
        'input': 'What is the capital of France?',
        'answer': 'Paris',
        'additional_context': {
            'category': 'geography',
            'difficulty': 'easy'
        }
    },
    # ... more examples
]

adapter = DefaultAdapter(
    model='openai/gpt-4o-mini',
    litellm_batch_completion_kwargs={
        'temperature': 0.0,
        'max_tokens': 100
    }
)

result = gepa.optimize(
    seed_candidate={'system_prompt': 'Answer questions accurately and concisely.'},
    trainset=train_data[:100],
    valset=train_data[100:],
    adapter=adapter,
    max_metric_calls=100
)

Advanced Configuration

Batched Completion Parameters

Control LiteLLM behavior:
adapter = DefaultAdapter(
    model='openai/gpt-4o-mini',
    max_litellm_workers=20,
    litellm_batch_completion_kwargs={
        'temperature': 0.7,
        'max_tokens': 500,
        'top_p': 0.9,
        'frequency_penalty': 0.0,
        'presence_penalty': 0.0
    }
)

Failure Scores

Customize scoring for incorrect answers:
evaluator = ContainsAnswerEvaluator(failure_score=0.0)
adapter = DefaultAdapter(
    model='openai/gpt-4o-mini',
    evaluator=evaluator
)

Best Practices

  1. Dataset Format: Ensure answer field contains the exact string to match
  2. Batch Size: Use max_litellm_workers to control parallelism based on rate limits
  3. Evaluation: Start with ContainsAnswerEvaluator, then customize if needed
  4. Context: Use additional_context to provide hints in feedback without affecting the task
  5. Testing: Validate your evaluator on a few examples before full optimization

Limitations

  • Single-turn only: Not suitable for multi-turn conversations
  • One component: Only optimizes the system prompt (first value in candidate dict)
  • String matching: Default evaluator uses substring matching, not semantic similarity

See Also

Build docs developers (and LLMs) love