Overview
The DefaultAdapter provides a simple way to optimize system prompts for single-turn LLM tasks. It’s ideal for scenarios where:
- You have a question-answer format
- The task can be completed in one turn
- You want to optimize the system prompt
- You can use any LiteLLM-compatible model or custom callable
Installation
The DefaultAdapter is included with GEPA. No additional dependencies required.
Quick Start
import gepa
from gepa.adapters.default_adapter import DefaultAdapter
# Prepare dataset
train_data = [
{
'input': 'What is 2+2?',
'answer': '4',
'additional_context': {}
},
# ... more examples
]
# Create adapter with OpenAI model
adapter = DefaultAdapter(
model='openai/gpt-4o-mini',
max_litellm_workers=10
)
# Optimize
result = gepa.optimize(
seed_candidate={'system_prompt': 'You are a helpful assistant.'},
trainset=train_data[:50],
valset=train_data[50:],
adapter=adapter,
max_metric_calls=150,
reflection_lm='openai/gpt-4'
)
print('Optimized prompt:', result.best_candidate['system_prompt'])
Class Signature
Defined in src/gepa/adapters/default_adapter/default_adapter.py:87:
class DefaultAdapter(GEPAAdapter[DefaultDataInst, DefaultTrajectory, DefaultRolloutOutput]):
def __init__(
self,
model: str | ChatCompletionCallable,
evaluator: Evaluator | None = None,
max_litellm_workers: int = 10,
litellm_batch_completion_kwargs: dict[str, Any] | None = None,
)
Parameters
model
str | ChatCompletionCallable
required
Model for task execution. Can be:
- LiteLLM model string (e.g.,
'openai/gpt-4o-mini')
- Custom callable accepting
list[ChatMessage] and returning str
evaluator
Evaluator | None
default:"ContainsAnswerEvaluator()"
Custom evaluator function. Signature:def evaluator(data: DefaultDataInst, response: str) -> EvaluationResult:
return EvaluationResult(
score=1.0,
feedback='Good answer',
objective_scores=None
)
Default evaluator checks if data['answer'] is contained in response.
Maximum parallel workers for LiteLLM batch completions.
litellm_batch_completion_kwargs
dict[str, Any] | None
default:"None"
Additional kwargs passed to litellm.batch_completion() (temperature, max_tokens, etc.).
Data Types
DefaultDataInst
Input data structure (src/gepa/adapters/default_adapter/default_adapter.py:11):
class DefaultDataInst(TypedDict):
input: str # User query or question
additional_context: dict[str, str] # Extra context for feedback
answer: str # Expected answer
DefaultTrajectory
Execution trace (src/gepa/adapters/default_adapter/default_adapter.py:23):
class DefaultTrajectory(TypedDict):
data: DefaultDataInst # Original data instance
full_assistant_response: str # Model's complete response
feedback: str # Evaluation feedback
DefaultRolloutOutput
Final output (src/gepa/adapters/default_adapter/default_adapter.py:29):
class DefaultRolloutOutput(TypedDict):
full_assistant_response: str # Model's response
Methods
evaluate()
Evaluates a candidate system prompt on a batch of examples.
def evaluate(
self,
batch: list[DefaultDataInst],
candidate: dict[str, str],
capture_traces: bool = False,
) -> EvaluationBatch[DefaultTrajectory, DefaultRolloutOutput]
Implementation: src/gepa/adapters/default_adapter/default_adapter.py:104
Parameters
batch: List of data instances to evaluate
candidate: Dictionary with system prompt (uses first value)
capture_traces: Whether to capture detailed trajectories
Returns
EvaluationBatch containing:
outputs: List of DefaultRolloutOutput
scores: Per-example scores (1.0 for correct, 0.0 for incorrect by default)
trajectories: List of DefaultTrajectory if capture_traces=True
objective_scores: Optional multi-objective scores
make_reflective_dataset()
Generates reflective dataset for prompt improvement.
def make_reflective_dataset(
self,
candidate: dict[str, str],
eval_batch: EvaluationBatch[DefaultTrajectory, DefaultRolloutOutput],
components_to_update: list[str],
) -> Mapping[str, Sequence[Mapping[str, Any]]]
Implementation: src/gepa/adapters/default_adapter/default_adapter.py:179
Returns
Dictionary mapping component names to reflective examples:
{
'system_prompt': [
{
'Inputs': 'What is 2+2?',
'Generated Outputs': 'The answer is 4',
'Feedback': 'The generated response is correct. The response includes the correct answer "4"'
},
# ... more examples
]
}
Custom Evaluators
Default: ContainsAnswerEvaluator
Checks if expected answer is in the response (src/gepa/adapters/default_adapter/default_adapter.py:63):
class ContainsAnswerEvaluator:
def __init__(self, failure_score: float = 0.0):
self.failure_score = failure_score
def __call__(self, data: DefaultDataInst, response: str) -> EvaluationResult:
is_correct = data['answer'] in response
score = 1.0 if is_correct else self.failure_score
if is_correct:
feedback = f"The generated response is correct. The response includes the correct answer '{data['answer']}'"
else:
feedback = f"The generated response is incorrect. The correct answer is '{data['answer']}'. Ensure that the correct answer is included in the response exactly as it is."
return EvaluationResult(score=score, feedback=feedback)
Custom Evaluator Example
Create a semantic similarity evaluator:
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
class SemanticEvaluator:
def __init__(self):
self.model = SentenceTransformer('all-MiniLM-L6-v2')
def __call__(self, data, response):
# Compute embeddings
ref_emb = self.model.encode([data['answer']])
resp_emb = self.model.encode([response])
# Calculate similarity
similarity = cosine_similarity(ref_emb, resp_emb)[0][0]
feedback = f"Semantic similarity: {similarity:.2f}"
if similarity < 0.7:
feedback += f" - Expected answer: {data['answer']}"
return EvaluationResult(
score=float(similarity),
feedback=feedback
)
# Use custom evaluator
adapter = DefaultAdapter(
model='openai/gpt-4o-mini',
evaluator=SemanticEvaluator()
)
Custom Model Callable
Use a custom model instead of LiteLLM:
def my_custom_model(messages: list[dict]) -> str:
# Your custom inference logic
system_prompt = messages[0]['content']
user_query = messages[1]['content']
# Call your model
response = my_model.generate(system_prompt, user_query)
return response
adapter = DefaultAdapter(
model=my_custom_model,
evaluator=ContainsAnswerEvaluator()
)
The callable must:
- Accept
list[ChatMessage] where ChatMessage = TypedDict('ChatMessage', {'role': str, 'content': str})
- Return
str (the model’s response)
Multi-Objective Optimization
Return multiple scores from your evaluator:
class MultiObjectiveEvaluator:
def __call__(self, data, response):
# Compute multiple metrics
correctness = 1.0 if data['answer'] in response else 0.0
brevity = 1.0 / (len(response) + 1) # Shorter is better
clarity = compute_clarity_score(response) # Your metric
return EvaluationResult(
score=correctness, # Primary score
feedback='Feedback message',
objective_scores={
'correctness': correctness,
'brevity': brevity,
'clarity': clarity
}
)
adapter = DefaultAdapter(
model='openai/gpt-4o-mini',
evaluator=MultiObjectiveEvaluator()
)
GEPA will maintain a Pareto front across all objectives.
Examples
AIME Math Problems
From the GEPA AIME tutorial:
import gepa
from gepa.adapters.default_adapter import DefaultAdapter
# Load AIME dataset
trainset, valset, testset = gepa.examples.aime.init_dataset()
# Create adapter
adapter = DefaultAdapter(
model='openai/gpt-4o-mini',
max_litellm_workers=10
)
# Optimize
result = gepa.optimize(
seed_candidate={
'system_prompt': 'You are a helpful assistant. Answer the question. '
'Put your final answer in the format "### <answer>"'
},
trainset=trainset,
valset=valset,
adapter=adapter,
max_metric_calls=150,
reflection_lm='openai/gpt-4'
)
print('Optimized prompt:', result.best_candidate['system_prompt'])
print('Validation score:', result.best_score)
Result: GPT-4 Mini goes from 46.6% → 56.6% on AIME 2025.
Question Answering with Context
train_data = [
{
'input': 'What is the capital of France?',
'answer': 'Paris',
'additional_context': {
'category': 'geography',
'difficulty': 'easy'
}
},
# ... more examples
]
adapter = DefaultAdapter(
model='openai/gpt-4o-mini',
litellm_batch_completion_kwargs={
'temperature': 0.0,
'max_tokens': 100
}
)
result = gepa.optimize(
seed_candidate={'system_prompt': 'Answer questions accurately and concisely.'},
trainset=train_data[:100],
valset=train_data[100:],
adapter=adapter,
max_metric_calls=100
)
Advanced Configuration
Batched Completion Parameters
Control LiteLLM behavior:
adapter = DefaultAdapter(
model='openai/gpt-4o-mini',
max_litellm_workers=20,
litellm_batch_completion_kwargs={
'temperature': 0.7,
'max_tokens': 500,
'top_p': 0.9,
'frequency_penalty': 0.0,
'presence_penalty': 0.0
}
)
Failure Scores
Customize scoring for incorrect answers:
evaluator = ContainsAnswerEvaluator(failure_score=0.0)
adapter = DefaultAdapter(
model='openai/gpt-4o-mini',
evaluator=evaluator
)
Best Practices
- Dataset Format: Ensure
answer field contains the exact string to match
- Batch Size: Use
max_litellm_workers to control parallelism based on rate limits
- Evaluation: Start with
ContainsAnswerEvaluator, then customize if needed
- Context: Use
additional_context to provide hints in feedback without affecting the task
- Testing: Validate your evaluator on a few examples before full optimization
Limitations
- Single-turn only: Not suitable for multi-turn conversations
- One component: Only optimizes the system prompt (first value in candidate dict)
- String matching: Default evaluator uses substring matching, not semantic similarity
See Also