Skip to main content

Overview

gepa.optimize() is GEPA’s core optimization function for evolving text components (prompts, code, instructions) toward a given metric. It uses evolutionary search with LLM-based reflection and Pareto-efficient tracking.

Function Signature

def optimize(
    seed_candidate: dict[str, str],
    trainset: list[DataInst] | DataLoader[DataId, DataInst],
    valset: list[DataInst] | DataLoader[DataId, DataInst] | None = None,
    adapter: GEPAAdapter[DataInst, Trajectory, RolloutOutput] | None = None,
    task_lm: str | ChatCompletionCallable | None = None,
    evaluator: Evaluator | None = None,
    # Reflection-based configuration
    reflection_lm: LanguageModel | str | None = None,
    candidate_selection_strategy: CandidateSelector | Literal["pareto", "current_best", "epsilon_greedy"] = "pareto",
    frontier_type: FrontierType = "instance",
    skip_perfect_score: bool = True,
    batch_sampler: BatchSampler | Literal["epoch_shuffled"] = "epoch_shuffled",
    reflection_minibatch_size: int | None = None,
    perfect_score: float = 1.0,
    reflection_prompt_template: str | dict[str, str] | None = None,
    custom_candidate_proposer: ProposalFn | None = None,
    # Component selection configuration
    module_selector: ReflectionComponentSelector | str = "round_robin",
    # Merge-based configuration
    use_merge: bool = False,
    max_merge_invocations: int = 5,
    merge_val_overlap_floor: int = 5,
    # Budget and Stop Condition
    max_metric_calls: int | None = None,
    stop_callbacks: StopperProtocol | Sequence[StopperProtocol] | None = None,
    # Logging and Callbacks
    logger: LoggerProtocol | None = None,
    run_dir: str | None = None,
    callbacks: list[GEPACallback] | None = None,
    use_wandb: bool = False,
    wandb_api_key: str | None = None,
    wandb_init_kwargs: dict[str, Any] | None = None,
    use_mlflow: bool = False,
    mlflow_tracking_uri: str | None = None,
    mlflow_experiment_name: str | None = None,
    track_best_outputs: bool = False,
    display_progress_bar: bool = False,
    use_cloudpickle: bool = False,
    # Evaluation caching
    cache_evaluation: bool = False,
    # Reproducibility
    seed: int = 0,
    raise_on_exception: bool = True,
    val_evaluation_policy: EvaluationPolicy[DataId, DataInst] | Literal["full_eval"] | None = None,
) -> GEPAResult[RolloutOutput, DataId]

Required Parameters

seed_candidate
dict[str, str]
required
The initial candidate to start with. A mapping from component names to component text.
trainset
list[DataInst] | DataLoader[DataId, DataInst]
required
Training data supplied as an in-memory sequence or a DataLoader yielding batches for reflective updates.

Optional Parameters

Data & Evaluation

valset
list[DataInst] | DataLoader[DataId, DataInst] | None
default:"None"
Validation data source (sequence or DataLoader) used for tracking Pareto scores. If not provided, GEPA reuses the trainset.
adapter
GEPAAdapter[DataInst, Trajectory, RolloutOutput] | None
default:"None"
A GEPAAdapter instance that implements the adapter interface. This allows GEPA to plug into your system’s environment. If not provided, GEPA will use the default adapter with the model defined by task_lm.
task_lm
str | ChatCompletionCallable | None
default:"None"
The model to use for the task. Only used if adapter is not provided, and is used to initialize the default adapter.
evaluator
Evaluator | None
default:"None"
A custom evaluator to use for evaluating the candidate program. Only used if adapter is not provided.

Reflection Configuration

reflection_lm
LanguageModel | str | None
default:"None"
A LanguageModel instance or model name string that is used to reflect on the performance of the candidate program.
candidate_selection_strategy
CandidateSelector | Literal['pareto', 'current_best', 'epsilon_greedy']
default:"'pareto'"
The strategy to use for selecting the candidate to update. Supported strategies: ‘pareto’, ‘current_best’, ‘epsilon_greedy’.
frontier_type
FrontierType
default:"'instance'"
Strategy for tracking Pareto frontiers:
  • 'instance': tracks per validation example
  • 'objective': tracks per objective metric
  • 'hybrid': combines both
  • 'cartesian': tracks per (example, objective) pair
skip_perfect_score
bool
default:"True"
Whether to skip updating the candidate if it achieves a perfect score on the minibatch.
batch_sampler
BatchSampler | Literal['epoch_shuffled']
default:"'epoch_shuffled'"
Strategy for selecting training examples. Can be a BatchSampler instance or a string for a predefined strategy.
reflection_minibatch_size
int | None
default:"None"
The number of examples to use for reflection in each proposal step. Defaults to 3. Only valid when batch_sampler='epoch_shuffled' (default).
perfect_score
float
default:"1.0"
The perfect score to achieve.
reflection_prompt_template
str | dict[str, str] | None
default:"None"
The prompt template to use for reflection. Can be either a string (applied to all components) or a dict mapping component names to their specific templates. Must contain <curr_param> and <side_info> placeholders.
custom_candidate_proposer
ProposalFn | None
default:"None"
Optional custom function for proposing new candidates. If provided, this will be used instead of the default LLM-based reflection approach. Signature: (candidate, reflective_dataset, components_to_update) -> dict[str, str].

Component Selection

module_selector
ReflectionComponentSelector | str
default:"'round_robin'"
Component selection strategy. Can be a ReflectionComponentSelector instance or a string (‘round_robin’, ‘all’). The ‘round_robin’ strategy cycles through components in order. The ‘all’ strategy selects all components for modification in every GEPA iteration.

Merge Configuration

use_merge
bool
default:"False"
Whether to use the merge strategy.
max_merge_invocations
int
default:"5"
The maximum number of merge invocations to perform.
merge_val_overlap_floor
int
default:"5"
Minimum number of shared validation ids required between parents before attempting a merge subsample. Only relevant when using val_evaluation_policy other than full_eval.

Budget & Stopping

max_metric_calls
int | None
default:"None"
Optional maximum number of metric calls to perform. If not provided, stop_callbacks must be provided.
stop_callbacks
StopperProtocol | Sequence[StopperProtocol] | None
default:"None"
Optional stopper(s) that return True when optimization should stop. Examples: FileStopper, TimeoutStopCondition, SignalStopper, NoImprovementStopper, or custom stopping logic. If not provided, max_metric_calls must be provided.

Logging & Tracking

logger
LoggerProtocol | None
default:"None"
A LoggerProtocol instance that is used to log the progress of the optimization.
run_dir
str | None
default:"None"
The directory to save the results to. Optimization state and results will be saved to this directory. If the directory already exists, GEPA will read the state from this directory and resume the optimization from the last saved state. If provided, a FileStopper is automatically created which checks for the presence of “gepa.stop” in this directory.
callbacks
list[GEPACallback] | None
default:"None"
Optional list of callback objects for observing optimization progress. Callbacks receive events like on_optimization_start, on_iteration_start, on_candidate_accepted, etc.
use_wandb
bool
default:"False"
Whether to use Weights and Biases to log the progress of the optimization.
wandb_api_key
str | None
default:"None"
The API key to use for Weights and Biases.
wandb_init_kwargs
dict[str, Any] | None
default:"None"
Additional keyword arguments to pass to the Weights and Biases initialization.
use_mlflow
bool
default:"False"
Whether to use MLflow to log the progress of the optimization. Both wandb and mlflow can be used simultaneously.
mlflow_tracking_uri
str | None
default:"None"
The tracking URI to use for MLflow.
mlflow_experiment_name
str | None
default:"None"
The experiment name to use for MLflow.
track_best_outputs
bool
default:"False"
Whether to track the best outputs on the validation set. If True, GEPAResult will contain the best outputs obtained for each task in the validation set.
display_progress_bar
bool
default:"False"
Show a tqdm progress bar over metric calls when enabled.
use_cloudpickle
bool
default:"False"
Use cloudpickle instead of pickle. This can be helpful when the serialized state contains dynamically generated DSPy signatures.

Evaluation Caching

cache_evaluation
bool
default:"False"
Whether to cache the (score, output, objective_scores) of (candidate, example) pairs. If True and a cache entry exists, GEPA will skip the fitness evaluation and use the cached results.

Reproducibility

seed
int
default:"0"
The seed to use for the random number generator.
raise_on_exception
bool
default:"True"
Whether to propagate proposer/evaluator exceptions instead of stopping gracefully.
val_evaluation_policy
EvaluationPolicy[DataId, DataInst] | Literal['full_eval'] | None
default:"None"
Strategy controlling which validation ids to score each iteration and which candidate is currently best. Supported strings: “full_eval” (evaluate every id each time). Passing None defaults to “full_eval”.

Returns

result
GEPAResult[RolloutOutput, DataId]
A GEPAResult object containing the optimization results, including the best candidate, all explored candidates, validation scores, and Pareto frontier information.

Key Concepts

System & Candidate

  • System: A harness that uses text components to perform a task. Each text component of the system to be optimized is a named component of the system.
  • Candidate: A mapping from component names to component text. A concrete instantiation of the system is realized by setting the text of each system component to the text provided by the candidate mapping.
  • DataInst: An (uninterpreted) data type over which the system operates.
  • RolloutOutput: The output of the system on a DataInst.

Optimization Strategies

At each iteration, GEPA proposes a new candidate using one of the following strategies:
  1. Reflective mutation: GEPA proposes a new candidate by mutating the current candidate, leveraging rich textual feedback.
  2. Merge: GEPA proposes a new candidate by merging 2 candidates that are on the Pareto frontier.
GEPA also tracks the Pareto frontier of performance achieved by different candidates on the validation set. This way, it can leverage candidates that work well on a subset of inputs to improve the system’s performance on the entire validation set.

Example Usage

import gepa
from gepa.adapters.default_adapter import DefaultAdapter

# Define your training data
train_data = [
    {"input": "What is 2+2?", "answer": "4"},
    {"input": "What is the capital of France?", "answer": "Paris"},
]

# Define initial candidate
seed = {
    "instruction": "Answer the question accurately."
}

# Run optimization
result = gepa.optimize(
    seed_candidate=seed,
    trainset=train_data,
    task_lm="gpt-3.5-turbo",
    reflection_lm="gpt-4",
    max_metric_calls=100,
)

print(f"Best candidate: {result.best_candidate}")
print(f"Best score: {result.val_aggregate_scores[result.best_idx]}")

See Also

Build docs developers (and LLMs) love