Skip to main content
GEPA provides extensive configuration options to control optimization behavior, resource usage, and experiment tracking. This guide covers all configuration settings and how to use them effectively.

Configuration Structure

GEPA uses a hierarchical configuration with GEPAConfig as the root:
from gepa.optimize_anything import (
    GEPAConfig,
    EngineConfig,
    ReflectionConfig,
    TrackingConfig,
    MergeConfig,
    RefinerConfig,
)

config = GEPAConfig(
    engine=EngineConfig(...),        # Core optimization loop
    reflection=ReflectionConfig(...), # LLM reflection settings
    tracking=TrackingConfig(...),     # Experiment tracking
    merge=MergeConfig(...),           # Optional: candidate merging
    refiner=RefinerConfig(...),       # Optional: per-eval refinement
)

EngineConfig

Controls the optimization loop, budget, parallelism, and caching.

Basic Settings

from gepa.optimize_anything import EngineConfig

engine = EngineConfig(
    # Stopping conditions
    max_metric_calls=300,           # Stop after 300 evaluations
    max_candidate_proposals=50,      # Stop after 50 proposal attempts
    
    # Execution control
    seed=42,                         # Random seed for reproducibility
    run_dir="./runs/experiment_1",   # Save checkpoints here
    raise_on_exception=True,         # Raise on errors (vs. log and continue)
    
    # UI
    display_progress_bar=False,      # Show tqdm progress bar
)

Parameters

1

max_metric_calls

Type: int | None
Default: None
Maximum number of evaluator calls before stopping. This is your evaluation budget.
engine = EngineConfig(
    max_metric_calls=100,  # Stop after 100 evals
)
At least one stopping condition is required (max_metric_calls or max_candidate_proposals).
2

max_candidate_proposals

Type: int | None
Default: None
Stop after this many proposal attempts (including rejected proposals).
engine = EngineConfig(
    max_candidate_proposals=20,  # Try 20 mutations
)
3

parallel

Type: bool
Default: False
Enable parallel evaluation of examples.
engine = EngineConfig(
    parallel=True,
    max_workers=8,  # Use 8 parallel workers
)
4

cache_evaluation

Type: bool
Default: False
Cache evaluation results to avoid re-computing identical candidates.
engine = EngineConfig(
    cache_evaluation=True,
    cache_evaluation_storage="disk",  # "memory", "disk", or "auto"
    run_dir="./runs/exp1",  # Required for disk caching
)
5

capture_stdio

Type: bool
Default: False
Automatically capture print() output during evaluation and include it in side_info.
engine = EngineConfig(
    capture_stdio=True,
)

# Your evaluator
def evaluate(candidate):
    print("Debug info")  # Captured automatically
    return score
Captures Python-level output only. Doesn’t capture C extension or subprocess output.

Complete EngineConfig

engine = EngineConfig(
    # Stopping conditions
    max_metric_calls=300,
    max_candidate_proposals=None,
    
    # Execution
    run_dir="./runs/experiment_1",
    seed=42,
    raise_on_exception=True,
    use_cloudpickle=True,
    
    # Display
    display_progress_bar=False,
    track_best_outputs=False,
    
    # Parallelization
    parallel=True,
    max_workers=8,
    
    # Caching
    cache_evaluation=True,
    cache_evaluation_storage="auto",  # "memory", "disk", or "auto"
    
    # Capture
    capture_stdio=True,
    
    # Advanced: top-K best evals per example for warm-starting
    best_example_evals_k=30,
    
    # Strategy selection
    candidate_selection_strategy="pareto",  # "pareto", "current_best", "epsilon_greedy"
    frontier_type="hybrid",  # "hybrid", "per_task", "aggregate"
    val_evaluation_policy="full_eval",
)

ReflectionConfig

Controls how the LLM proposes improved candidates.

Basic Settings

from gepa.optimize_anything import ReflectionConfig

reflection = ReflectionConfig(
    reflection_lm="openai/gpt-4o",   # Model for reflection
    reflection_minibatch_size=3,      # Examples per reflection
)

Parameters

1

reflection_lm

Type: LanguageModel | str | None
Default: "openai/gpt-5.1"
LLM for proposing improved candidates. Can be:
  • String: LiteLLM model name (e.g., "openai/gpt-4o")
  • Callable: Custom LM function
# Using LiteLLM
reflection = ReflectionConfig(
    reflection_lm="openai/gpt-4o",
)

# Using custom function
from gepa.optimize_anything import make_litellm_lm

custom_lm = make_litellm_lm("anthropic/claude-3-5-sonnet-20241022")
reflection = ReflectionConfig(
    reflection_lm=custom_lm,
)
2

reflection_minibatch_size

Type: int | None
Default: None (auto: 1 for single-task, 3 for multi-task)
Number of examples shown to the LLM per reflection step.
reflection = ReflectionConfig(
    reflection_minibatch_size=5,  # Show 5 examples at a time
)
Smaller batches → more focused improvements
Larger batches → more context, potentially better generalization
3

reflection_prompt_template

Type: str | dict[str, str] | None
Default: Built-in template
Custom prompt template for reflection. Use <curr_param> and <side_info> placeholders.
custom_template = """
You are optimizing a system parameter.

Current parameter:
curr_param

Evaluation results:
side_info

Propose an improved parameter based on the results.
Provide ONLY the improved parameter within ``` blocks.
"""

reflection = ReflectionConfig(
    reflection_prompt_template=custom_template,
)
4

module_selector

Type: ReflectionComponentSelector | Literal["round_robin", "all"]
Default: "round_robin"
Strategy for selecting which components to update each iteration:
  • "round_robin": Cycle through components
  • "all": Update all components together
reflection = ReflectionConfig(
    module_selector="all",  # Update all params together
)

Complete ReflectionConfig

reflection = ReflectionConfig(
    # LLM settings
    reflection_lm="openai/gpt-4o",
    reflection_prompt_template=None,  # Use default
    
    # Minibatch settings
    reflection_minibatch_size=3,
    batch_sampler="epoch_shuffled",
    
    # Component selection
    module_selector="round_robin",
    
    # Early stopping
    skip_perfect_score=False,
    perfect_score=None,
    
    # Advanced: custom proposer
    custom_candidate_proposer=None,
)

MergeConfig

Enables cross-pollination between candidates on the Pareto frontier.
from gepa.optimize_anything import MergeConfig

merge = MergeConfig(
    max_merge_invocations=5,      # Try up to 5 merges
    merge_val_overlap_floor=5,    # Min overlap for merge
)

config = GEPAConfig(
    merge=merge,  # Enable merging
)

When to Use

Merging helps when:
  • Multiple candidates excel on different subsets
  • You want to combine their strengths
  • You have enough budget for merge attempts
Merging is disabled by default. Set merge=MergeConfig(...) to enable.

RefinerConfig

Enables automatic per-evaluation candidate refinement.
from gepa.optimize_anything import RefinerConfig

refiner = RefinerConfig(
    refiner_lm="openai/gpt-4o-mini",  # Use cheaper model
    max_refinements=2,                 # Refine up to 2 times
)

config = GEPAConfig(
    refiner=refiner,  # Enable refinement
)

How It Works

  1. Evaluate candidate → get feedback
  2. LLM proposes refined version based on feedback
  3. Re-evaluate refined version
  4. Keep better of (original, refined)
  5. Repeat up to max_refinements times
Refinement multiplies evaluation cost. Budget accordingly.

TrackingConfig

Experiment tracking and logging.

Basic Logging

from gepa.optimize_anything import TrackingConfig

tracking = TrackingConfig(
    logger=None,  # Use default stdout logger
)

Weights & Biases

tracking = TrackingConfig(
    use_wandb=True,
    wandb_api_key="your-api-key",  # Or set WANDB_API_KEY env var
    wandb_init_kwargs={
        "project": "gepa-optimization",
        "name": "experiment-1",
        "tags": ["math", "gpt-4o"],
    },
)

MLflow

tracking = TrackingConfig(
    use_mlflow=True,
    mlflow_tracking_uri="http://localhost:5000",
    mlflow_experiment_name="gepa-math-optimization",
)

Custom Logger

from gepa.logging.logger import LoggerProtocol

class MyLogger:
    def log(self, message: str):
        # Your logging logic
        print(f"[CUSTOM] {message}")

tracking = TrackingConfig(
    logger=MyLogger(),
)

Custom Stop Conditions

from gepa.utils import StopperProtocol, NoImprovementStopper, TimeoutStopCondition

# Stop if no improvement for 10 iterations
no_improvement = NoImprovementStopper(
    patience=10,
    min_improvement=0.01,
)

# Stop after 1 hour
timeout = TimeoutStopCondition(timeout_seconds=3600)

# Combine multiple stoppers
from gepa.utils import CompositeStopper

stopper = CompositeStopper(no_improvement, timeout)

config = GEPAConfig(
    stop_callbacks=stopper,
)

Complete Configuration Example

from gepa.optimize_anything import (
    optimize_anything,
    GEPAConfig,
    EngineConfig,
    ReflectionConfig,
    TrackingConfig,
    MergeConfig,
    RefinerConfig,
)

config = GEPAConfig(
    # Core optimization loop
    engine=EngineConfig(
        max_metric_calls=500,
        run_dir="./runs/math_optimization",
        seed=42,
        parallel=True,
        max_workers=8,
        cache_evaluation=True,
        cache_evaluation_storage="disk",
        capture_stdio=True,
        display_progress_bar=True,
    ),
    
    # LLM reflection
    reflection=ReflectionConfig(
        reflection_lm="openai/gpt-4o",
        reflection_minibatch_size=3,
        module_selector="round_robin",
    ),
    
    # Experiment tracking
    tracking=TrackingConfig(
        use_wandb=True,
        wandb_init_kwargs={
            "project": "gepa-math",
            "name": "aime-optimization-v1",
        },
    ),
    
    # Optional: Candidate merging
    merge=MergeConfig(
        max_merge_invocations=5,
        merge_val_overlap_floor=5,
    ),
    
    # Optional: Per-eval refinement
    refiner=RefinerConfig(
        refiner_lm="openai/gpt-4o-mini",
        max_refinements=1,
    ),
)

result = optimize_anything(
    seed_candidate={"prompt": "Solve this math problem:"},
    evaluator=my_evaluator,
    dataset=train_problems,
    valset=val_problems,
    objective="Generate prompts that solve AIME-level math problems.",
    config=config,
)

Configuration from Dict/JSON

import json
from gepa.optimize_anything import GEPAConfig

# Save config
config_dict = config.to_dict()
with open("config.json", "w") as f:
    json.dump(config_dict, f, indent=2)

# Load config
with open("config.json") as f:
    config_dict = json.load(f)
    
config = GEPAConfig.from_dict(config_dict)

Environment Variables

GEPA respects these environment variables:
# API keys
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export WANDB_API_KEY="..."

# Execution
export GEPA_CACHE_DIR="/path/to/cache"
export GEPA_RUN_DIR="/path/to/runs"

# Logging
export GEPA_LOG_LEVEL="INFO"  # DEBUG, INFO, WARNING, ERROR

Best Practices

1

Start Small

Begin with low budget to test your setup:
engine = EngineConfig(max_metric_calls=20)
2

Enable Caching

Avoid redundant evaluations:
engine = EngineConfig(
    cache_evaluation=True,
    run_dir="./runs/exp1",
)
3

Use Parallelization

Speed up evaluation when possible:
engine = EngineConfig(
    parallel=True,
    max_workers=8,  # Match your CPU cores
)
4

Track Experiments

Enable W&B or MLflow for long runs:
tracking = TrackingConfig(
    use_wandb=True,
    wandb_init_kwargs={"project": "my-project"},
)
5

Save Checkpoints

Always set run_dir for long experiments:
engine = EngineConfig(
    run_dir="./runs/experiment_1",
    max_metric_calls=1000,
)

Troubleshooting

Out of Memory

Reduce parallelism or disable output tracking:
engine = EngineConfig(
    max_workers=4,  # Reduce from 8
    track_best_outputs=False,
    cache_evaluation_storage="disk",  # Instead of memory
)

Slow Optimization

Enable parallelization and caching:
engine = EngineConfig(
    parallel=True,
    max_workers=16,
    cache_evaluation=True,
)

API Rate Limits

Use cheaper models or add delays:
reflection = ReflectionConfig(
    reflection_lm="openai/gpt-4o-mini",  # Cheaper
)

# Or implement rate limiting in your evaluator
import time

def evaluate(candidate, example):
    time.sleep(0.1)  # Add delay
    return score

Next Steps

optimize_anything

Use your configuration with optimize_anything

Evaluation Metrics

Design evaluators that work with your config

Custom Adapters

Build adapters that leverage advanced configs

DSPy Integration

Configure DSPy optimization runs

Build docs developers (and LLMs) love