Pareto-Efficient Search

The Problem with Single-Best Selection

Traditional optimizers maintain a single “best” candidate:

# Traditional approach
results = [
    ("Candidate A", [0.9, 0.4, 0.6]),  # Good at task 1
    ("Candidate B", [0.6, 0.9, 0.5]),  # Good at task 2  
    ("Candidate C", [0.7, 0.7, 0.7]),  # Average everywhere
]

# Keep only highest average
best = max(results, key=lambda x: sum(x[1])/len(x[1]))
# Result: Candidate C (avg 0.70)
# Lost: Candidate A (specialist) and B (different specialist)

Problem: By averaging across all tasks, we lose candidates A and B that excel on specific subsets. If we later need to improve task 1 performance, we’ve discarded our best starting point (A).

The Pareto Solution

Pareto optimality: A candidate is Pareto-optimal if no other candidate is strictly better on all objectives. GEPA keeps all Pareto-optimal candidates — the Pareto frontier:

# GEPA approach
pareto_front = [
    ("Candidate A", [0.9, 0.4, 0.6]),  # Best at task 1
    ("Candidate B", [0.6, 0.9, 0.5]),  # Best at task 2
    ("Candidate D", [0.5, 0.6, 0.95]), # Best at task 3
]
# Candidate C not on front: dominated by average of A/B/D on each task

Benefits:

Specialization preservation: Candidates that excel on niche tasks survive
Diverse exploration: Select from multiple strong starting points
Cross-pollination: Merge candidates via system-aware combination
No premature convergence: Can’t get stuck in local optima

Frontier Types in GEPA

GEPA supports four frontier tracking strategies via FrontierType:

1. Instance-Level (`frontier_type="instance"`)

Default mode. Track which candidates are best on each validation example:

# From state.py
pareto_front_valset: dict[DataId, float] = {
    "example_1": 0.95,  # Best score seen for example 1
    "example_2": 0.87,  # Best score seen for example 2
    "example_3": 0.92,  # Best score seen for example 3
}

program_at_pareto_front_valset: dict[DataId, set[int]] = {
    "example_1": {3, 7},    # Candidates 3 and 7 both scored 0.95
    "example_2": {5},       # Only candidate 5 achieved 0.87
    "example_3": {3, 8},    # Candidates 3 and 8 tied at 0.92
}

Use when:

Each validation example represents a distinct task/scenario
You want candidates specialized to different input types
Example: Multi-hop QA with different reasoning patterns

Diagram:

Validation Set:
┌─────────────┬─────────────┬─────────────┐
│  Example 1  │  Example 2  │  Example 3  │
├─────────────┼─────────────┼─────────────┤
│ Cand 3: 0.9 │ Cand 5: 0.8 │ Cand 3: 0.7 │  ← Candidate 3
│ Cand 5: 0.6 │ Cand 7: 0.5 │ Cand 5: 0.6 │  ← Candidate 5
│ Cand 7: 0.7 │ Cand 9: 0.7 │ Cand 7: 0.9 │  ← Candidate 7
└─────────────┴─────────────┴─────────────┘
     ↓               ↓               ↓
  Cand 3 wins    Cand 5 wins    Cand 7 wins

Pareto Front: {3, 5, 7}  ← All survive

2. Objective-Level (`frontier_type="objective"`)

Track which candidates are best on each objective metric:

# From state.py  
objective_pareto_front: dict[str, float] = {
    "accuracy": 0.89,
    "latency_inv": 15.2,  # Higher is better (1/latency)
    "cost_inv": 8.3,
}

program_at_pareto_front_objectives: dict[str, set[int]] = {
    "accuracy": {4, 6},      # Candidates 4 and 6 tied at 0.89
    "latency_inv": {6, 8},   # Candidates 6 and 8 tied at 15.2
    "cost_inv": {4},         # Candidate 4 alone at 8.3
}

Use when:

You have explicit multi-objective metrics (accuracy, speed, cost, etc.)
Objectives trade off against each other (faster but less accurate)
Example: Compiler optimization (speed vs code size vs energy)

How to provide objective scores: Your evaluator must return objective_scores in side_info:

def evaluate(candidate, example):
    result = run_system(candidate, example)
    score = result.accuracy  # Primary score for acceptance
    
    return score, {
        "scores": {  # Multi-objective tracking
            "accuracy": result.accuracy,
            "latency_inv": 1.0 / result.latency_ms,  # Invert: higher = better
            "throughput": result.qps,
        },
        # ... other feedback
    }

All objective scores must follow “higher is better” convention. Invert metrics like latency, error rate, or cost.

3. Hybrid (`frontier_type="hybrid"`)

Combines instance-level AND objective-level frontiers. A candidate joins the Pareto front if it’s best on any example or any objective:

# Candidate survives if:
# 1. Best on at least one validation example, OR
# 2. Best on at least one objective metric

pareto_programs = (
    state.get_instance_pareto_programs() |
    state.get_objective_pareto_programs()
)

Use when:

You care about both per-example performance AND aggregate metrics
Want maximum diversity in the candidate pool
Example: Multi-task agent (some tasks need specialized handling, overall metrics matter too)

4. Cartesian (`frontier_type="cartesian"`)

Track best candidates on every (example, objective) pair:

# From state.py
pareto_front_cartesian: dict[tuple[DataId, str], float] = {
    ("example_1", "accuracy"): 0.95,
    ("example_1", "latency_inv"): 12.3,
    ("example_2", "accuracy"): 0.88,
    ("example_2", "latency_inv"): 18.7,
    # ...
}

program_at_pareto_front_cartesian: dict[tuple[DataId, str], set[int]] = {
    ("example_1", "accuracy"): {5},
    ("example_1", "latency_inv"): {3, 5},
    # ...
}

Use when:

Extreme diversity is critical
Different examples have different objective priorities
Very large validation sets (100s-1000s of examples)
Example: Heterogeneous workload optimization (each query has different perf characteristics)

Cartesian frontiers can grow very large (|examples| × |objectives| keys). Use with validation set subsampling policies.

Pareto Front Updates

When a new candidate is evaluated on the validation set, GEPA updates the frontier:

# From state.py
def update_state_with_new_program(
    self,
    new_program: dict[str, str],
    valset_evaluation: ValsetEvaluation,
    parent_program_idx: list[int],
) -> int:
    new_idx = len(self.program_candidates)
    self.program_candidates.append(dict(new_program))
    
    # Update per-example scores
    new_subscores = dict(valset_evaluation.scores_by_val_id)
    self.prog_candidate_val_subscores.append(new_subscores)
    
    # Update Pareto front for each example
    for val_id, new_score in new_subscores.items():
        current_best = self.pareto_front_valset[val_id]
        
        if new_score > current_best:
            # New champion for this example
            self.pareto_front_valset[val_id] = new_score
            self.program_at_pareto_front_valset[val_id] = {new_idx}
        
        elif new_score == current_best:
            # Tie: add to front
            self.program_at_pareto_front_valset[val_id].add(new_idx)
        
        # else: new_score < current_best, not on front for this example
    
    # Similarly for objective-level, cartesian, etc.
    # ...
    
    return new_idx

Key behavior:

Candidates that improve any example join the front
Ties are preserved (multiple candidates at the same score)
Dominated candidates are removed from the front but not deleted from program_candidates
History is preserved for analysis and potential restoration

Candidate Selection from the Front

During each iteration, GEPA selects a candidate from the Pareto front to evolve:

Pareto Selection Strategy (Default)

# From candidate_selector.py
class ParetoCandidateSelector:
    def select_candidate_idx(self, state: GEPAState) -> int:
        # Get all programs on the front
        pareto_programs = state.get_pareto_front_programs()
        
        # Uniform random selection
        return self.rng.choice(list(pareto_programs))

Benefits:

Every specialized candidate gets a chance to evolve
Exploration across different task subsets
Prevents premature convergence

Alternative Strategies

Current Best (Greedy):

config = GEPAConfig(
    engine=EngineConfig(
        candidate_selection_strategy="current_best"
    )
)
# Always select the single highest-scoring candidate
# Faster convergence, but may miss specialized improvements

Epsilon-Greedy:

config = GEPAConfig(
    engine=EngineConfig(
        candidate_selection_strategy="epsilon_greedy"
    )
)
# Select best with probability 0.9, random from front with 0.1
# Balances exploitation and exploration

Validation Evaluation Policies

Control how many validation examples are scored each iteration:

Full Evaluation (Default)

config = GEPAConfig(
    engine=EngineConfig(
        val_evaluation_policy="full_eval"
    )
)
# Evaluate every validation example every time
# Most accurate, but expensive for large validation sets

Incremental Evaluation (Future)

GEPA will support subsampling policies:

# Not yet implemented
from gepa.strategies.eval_policy import IncrementalEvaluationPolicy

policy = IncrementalEvaluationPolicy(
    initial_batch_size=10,
    expansion_rate=1.5,
    convergence_threshold=0.01
)

config = GEPAConfig(
    engine=EngineConfig(
        val_evaluation_policy=policy
    )
)
# Evaluate 10 examples initially, expand if promising
# Reduces evaluations by 3-5x with minimal accuracy loss

Merge: Cross-Pollinating the Frontier

GEPA can merge two Pareto-optimal candidates that excel on different validation subsets:

# From merge.py
class MergeProposer:
    def propose(self, state: GEPAState) -> CandidateProposal:
        # Find two candidates on the front with complementary strengths
        parent1, parent2 = select_merge_parents(state)
        
        # LLM-based merge prompt:
        # "Combine the strengths of these two candidates..."
        merged = merge_lm(parent1, parent2, reflective_context)
        
        return CandidateProposal(
            candidate=merged,
            parent_program_ids=[parent1_idx, parent2_idx]
        )

Enable merge:

config = GEPAConfig(
    merge=MergeConfig(
        max_merge_invocations=5,  # Attempt up to 5 merges
        merge_val_overlap_floor=5  # Require 5+ shared validation examples
    )
)

How it works:

After a successful reflective mutation, GEPA schedules a merge attempt
Select two parents that:
- Both on Pareto front
- Excel on different validation subsets
- Share at least merge_val_overlap_floor examples for comparison
LLM merges them by combining successful strategies
Evaluate on a subsample overlapping both parents’ strengths
Accept if merged candidate beats both parents on the subsample

Example: Candidate A solves algebraic problems well, B solves geometric problems well. Merge produces C that handles both.

Pareto Front Analysis

Inspect the Pareto front in your optimization results:

result = optimize_anything(...)

# Get all candidates on the front
front = result.state.get_pareto_front_programs()
print(f"Pareto front size: {len(front)}")

# Analyze each frontier candidate
for prog_idx in front:
    candidate = result.state.program_candidates[prog_idx]
    scores = result.state.prog_candidate_val_subscores[prog_idx]
    
    print(f"\nCandidate {prog_idx}:")
    print(f"  Average score: {sum(scores.values()) / len(scores):.3f}")
    print(f"  Best examples: {[k for k,v in scores.items() if v > 0.9]}")
    print(f"  Worst examples: {[k for k,v in scores.items() if v < 0.5]}")

Visualize the tradeoff space:

import matplotlib.pyplot as plt

# For objective-level frontiers
objective_scores = result.state.prog_candidate_objective_scores
front = result.state.get_pareto_front_programs()

plt.figure(figsize=(10, 6))
for prog_idx in front:
    scores = objective_scores[prog_idx]
    plt.scatter(
        scores["accuracy"],
        scores["latency_inv"],
        s=100,
        label=f"Candidate {prog_idx}"
    )

plt.xlabel("Accuracy")
plt.ylabel("Speed (1/latency)")
plt.title("Pareto Front: Accuracy vs Speed Tradeoff")
plt.legend()
plt.show()

Choosing the Right Frontier Type

Use Case	Recommended Type	Why
Single metric, diverse tasks	`instance`	Per-example specialization
Multi-objective optimization	`objective`	Explicit metric tradeoffs
Agent with tool specialization	`instance`	Different tools excel on different queries
Compiler optimization	`objective`	Speed/size/energy tradeoffs
Maximum diversity	`hybrid` or `cartesian`	Preserve all forms of specialization
Small validation set (<20)	`instance` or `objective`	Simple and effective
Large validation set (100s+)	`objective` + subsampling	Avoid front explosion

Performance Implications

Frontier size impact:

Instance-level: ~1-10 candidates for small valsets, 5-20 for large
Objective-level: ~2-5 candidates per objective
Hybrid: Sum of above
Cartesian: Can grow to 50+ candidates

Larger frontiers:

✅ More diversity, better exploration
✅ More opportunities for merge
❌ More candidates to select from (negligible cost)
❌ More state to serialize (marginal)

Recommendation: Start with instance (default). Switch to objective if you have explicit multi-objective metrics. Use hybrid for maximum diversity at minimal cost.

Next Steps

Actionable Side Information

Learn how rich feedback drives Pareto-efficient search

How GEPA Works

Understand the full optimization loop

Advanced Configuration

Configure frontier types and merge strategies

Evaluation Policies

Optimize validation evaluation for large datasets

Get Started

Core Concepts

Guides

Use Cases

Pareto-Efficient Search

The Problem with Single-Best Selection

The Pareto Solution

Frontier Types in GEPA

1. Instance-Level (`frontier_type="instance"`)

2. Objective-Level (`frontier_type="objective"`)

3. Hybrid (`frontier_type="hybrid"`)

4. Cartesian (`frontier_type="cartesian"`)

Pareto Front Updates

Candidate Selection from the Front

Pareto Selection Strategy (Default)

Alternative Strategies

Validation Evaluation Policies

Full Evaluation (Default)

Incremental Evaluation (Future)

Merge: Cross-Pollinating the Frontier

Pareto Front Analysis

Choosing the Right Frontier Type

Performance Implications

Next Steps

Actionable Side Information

How GEPA Works

Advanced Configuration

Evaluation Policies

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Use Cases

​The Problem with Single-Best Selection

​The Pareto Solution

​Frontier Types in GEPA

​1. Instance-Level (frontier_type="instance")

​2. Objective-Level (frontier_type="objective")

​3. Hybrid (frontier_type="hybrid")

​4. Cartesian (frontier_type="cartesian")

​Pareto Front Updates

​Candidate Selection from the Front

​Pareto Selection Strategy (Default)

​Alternative Strategies

​Validation Evaluation Policies

​Full Evaluation (Default)

​Incremental Evaluation (Future)

​Merge: Cross-Pollinating the Frontier

​Pareto Front Analysis

​Choosing the Right Frontier Type

​Performance Implications

​Next Steps

Actionable Side Information

How GEPA Works

Advanced Configuration

Evaluation Policies

Build docs developers (and LLMs) love

The Problem with Single-Best Selection

The Pareto Solution

Frontier Types in GEPA

1. Instance-Level (`frontier_type="instance"`)

2. Objective-Level (`frontier_type="objective"`)

3. Hybrid (`frontier_type="hybrid"`)

4. Cartesian (`frontier_type="cartesian"`)

Pareto Front Updates

Candidate Selection from the Front

Pareto Selection Strategy (Default)

Alternative Strategies

Validation Evaluation Policies

Full Evaluation (Default)

Incremental Evaluation (Future)

Merge: Cross-Pollinating the Frontier

Pareto Front Analysis

Choosing the Right Frontier Type

Performance Implications

Next Steps