Evaluator Basics
An evaluator is a function that scores a candidate. It can return:- Just a score:
float - Score with side info:
tuple[float, dict]
Score Design Principles
1. Higher is Better
GEPA always maximizes scores. Transform metrics accordingly:2. Meaningful Scale
Use scores in a consistent range:3. Granular Feedback
Provide scores that differentiate candidates:Side Information (ASI)
Actionable Side Information is the text-optimization analogue of the gradient. More informative ASI → better optimization.Basic Structure
Multi-Objective Metrics
Track multiple objectives in the"scores" field:
Parameter-Specific Info
Provide feedback specific to each parameter:X_specific_info.
Common Evaluation Patterns
Exact Match
Contains Match
Token Overlap (F1)
Unit Tests
LLM-as-Judge
RAG Evaluation
For RAG systems, evaluate both retrieval and generation:Evaluation with State
Access historical evaluations for warm-starting:Multi-Stage Evaluation
Break complex evaluations into stages:Handling Errors
Always return a score, never raise:Composite Metrics
Combine multiple metrics:Best Practices
Next Steps
optimize_anything
Use your evaluator with optimize_anything
Custom Adapters
Build adapters with custom evaluation logic
Configuration
Configure evaluation caching and parallelization
DSPy Integration
DSPy-specific evaluation patterns