gepa.optimize()

Overview

gepa.optimize() is GEPA’s core optimization function for evolving text components (prompts, code, instructions) toward a given metric. It uses evolutionary search with LLM-based reflection and Pareto-efficient tracking.

Function Signature

def optimize(
    seed_candidate: dict[str, str],
    trainset: list[DataInst] | DataLoader[DataId, DataInst],
    valset: list[DataInst] | DataLoader[DataId, DataInst] | None = None,
    adapter: GEPAAdapter[DataInst, Trajectory, RolloutOutput] | None = None,
    task_lm: str | ChatCompletionCallable | None = None,
    evaluator: Evaluator | None = None,
    # Reflection-based configuration
    reflection_lm: LanguageModel | str | None = None,
    candidate_selection_strategy: CandidateSelector | Literal["pareto", "current_best", "epsilon_greedy"] = "pareto",
    frontier_type: FrontierType = "instance",
    skip_perfect_score: bool = True,
    batch_sampler: BatchSampler | Literal["epoch_shuffled"] = "epoch_shuffled",
    reflection_minibatch_size: int | None = None,
    perfect_score: float = 1.0,
    reflection_prompt_template: str | dict[str, str] | None = None,
    custom_candidate_proposer: ProposalFn | None = None,
    # Component selection configuration
    module_selector: ReflectionComponentSelector | str = "round_robin",
    # Merge-based configuration
    use_merge: bool = False,
    max_merge_invocations: int = 5,
    merge_val_overlap_floor: int = 5,
    # Budget and Stop Condition
    max_metric_calls: int | None = None,
    stop_callbacks: StopperProtocol | Sequence[StopperProtocol] | None = None,
    # Logging and Callbacks
    logger: LoggerProtocol | None = None,
    run_dir: str | None = None,
    callbacks: list[GEPACallback] | None = None,
    use_wandb: bool = False,
    wandb_api_key: str | None = None,
    wandb_init_kwargs: dict[str, Any] | None = None,
    use_mlflow: bool = False,
    mlflow_tracking_uri: str | None = None,
    mlflow_experiment_name: str | None = None,
    track_best_outputs: bool = False,
    display_progress_bar: bool = False,
    use_cloudpickle: bool = False,
    # Evaluation caching
    cache_evaluation: bool = False,
    # Reproducibility
    seed: int = 0,
    raise_on_exception: bool = True,
    val_evaluation_policy: EvaluationPolicy[DataId, DataInst] | Literal["full_eval"] | None = None,
) -> GEPAResult[RolloutOutput, DataId]

Required Parameters

seed_candidate

dict[str, str]

required

The initial candidate to start with. A mapping from component names to component text.

trainset

list[DataInst] | DataLoader[DataId, DataInst]

required

Training data supplied as an in-memory sequence or a DataLoader yielding batches for reflective updates.

Optional Parameters

Data & Evaluation

valset

list[DataInst] | DataLoader[DataId, DataInst] | None

default:"None"

Validation data source (sequence or DataLoader) used for tracking Pareto scores. If not provided, GEPA reuses the trainset.

adapter

GEPAAdapter[DataInst, Trajectory, RolloutOutput] | None

default:"None"

A GEPAAdapter instance that implements the adapter interface. This allows GEPA to plug into your system’s environment. If not provided, GEPA will use the default adapter with the model defined by task_lm.

task_lm

str | ChatCompletionCallable | None

default:"None"

The model to use for the task. Only used if adapter is not provided, and is used to initialize the default adapter.

evaluator

Evaluator | None

default:"None"

A custom evaluator to use for evaluating the candidate program. Only used if adapter is not provided.

Reflection Configuration

reflection_lm

LanguageModel | str | None

default:"None"

A LanguageModel instance or model name string that is used to reflect on the performance of the candidate program.

candidate_selection_strategy

CandidateSelector | Literal['pareto', 'current_best', 'epsilon_greedy']

default:"'pareto'"

The strategy to use for selecting the candidate to update. Supported strategies: ‘pareto’, ‘current_best’, ‘epsilon_greedy’.

frontier_type

FrontierType

default:"'instance'"

Strategy for tracking Pareto frontiers:

'instance': tracks per validation example
'objective': tracks per objective metric
'hybrid': combines both
'cartesian': tracks per (example, objective) pair

skip_perfect_score

bool

default:"True"

Whether to skip updating the candidate if it achieves a perfect score on the minibatch.

batch_sampler

BatchSampler | Literal['epoch_shuffled']

default:"'epoch_shuffled'"

Strategy for selecting training examples. Can be a BatchSampler instance or a string for a predefined strategy.

reflection_minibatch_size

int | None

default:"None"

The number of examples to use for reflection in each proposal step. Defaults to 3. Only valid when batch_sampler='epoch_shuffled' (default).

perfect_score

float

default:"1.0"

The perfect score to achieve.

reflection_prompt_template

str | dict[str, str] | None

default:"None"

The prompt template to use for reflection. Can be either a string (applied to all components) or a dict mapping component names to their specific templates. Must contain <curr_param> and <side_info> placeholders.

custom_candidate_proposer

ProposalFn | None

default:"None"

Optional custom function for proposing new candidates. If provided, this will be used instead of the default LLM-based reflection approach. Signature: (candidate, reflective_dataset, components_to_update) -> dict[str, str].

Component Selection

module_selector

ReflectionComponentSelector | str

default:"'round_robin'"

Component selection strategy. Can be a ReflectionComponentSelector instance or a string (‘round_robin’, ‘all’). The ‘round_robin’ strategy cycles through components in order. The ‘all’ strategy selects all components for modification in every GEPA iteration.

Merge Configuration

use_merge

bool

default:"False"

Whether to use the merge strategy.

max_merge_invocations

int

default:"5"

The maximum number of merge invocations to perform.

merge_val_overlap_floor

int

default:"5"

Minimum number of shared validation ids required between parents before attempting a merge subsample. Only relevant when using val_evaluation_policy other than full_eval.

Budget & Stopping

max_metric_calls

int | None

default:"None"

Optional maximum number of metric calls to perform. If not provided, stop_callbacks must be provided.

stop_callbacks

StopperProtocol | Sequence[StopperProtocol] | None

default:"None"

Optional stopper(s) that return True when optimization should stop. Examples: FileStopper, TimeoutStopCondition, SignalStopper, NoImprovementStopper, or custom stopping logic. If not provided, max_metric_calls must be provided.

Logging & Tracking

logger

LoggerProtocol | None

default:"None"

A LoggerProtocol instance that is used to log the progress of the optimization.

run_dir

str | None

default:"None"

The directory to save the results to. Optimization state and results will be saved to this directory. If the directory already exists, GEPA will read the state from this directory and resume the optimization from the last saved state. If provided, a FileStopper is automatically created which checks for the presence of “gepa.stop” in this directory.

callbacks

list[GEPACallback] | None

default:"None"

Optional list of callback objects for observing optimization progress. Callbacks receive events like on_optimization_start, on_iteration_start, on_candidate_accepted, etc.

use_wandb

bool

default:"False"

Whether to use Weights and Biases to log the progress of the optimization.

wandb_api_key

str | None

default:"None"

The API key to use for Weights and Biases.

wandb_init_kwargs

dict[str, Any] | None

default:"None"

Additional keyword arguments to pass to the Weights and Biases initialization.

use_mlflow

bool

default:"False"

Whether to use MLflow to log the progress of the optimization. Both wandb and mlflow can be used simultaneously.

mlflow_tracking_uri

str | None

default:"None"

The tracking URI to use for MLflow.

mlflow_experiment_name

str | None

default:"None"

The experiment name to use for MLflow.

track_best_outputs

bool

default:"False"

Whether to track the best outputs on the validation set. If True, GEPAResult will contain the best outputs obtained for each task in the validation set.

display_progress_bar

bool

default:"False"

Show a tqdm progress bar over metric calls when enabled.

use_cloudpickle

bool

default:"False"

Use cloudpickle instead of pickle. This can be helpful when the serialized state contains dynamically generated DSPy signatures.

Evaluation Caching

cache_evaluation

bool

default:"False"

Whether to cache the (score, output, objective_scores) of (candidate, example) pairs. If True and a cache entry exists, GEPA will skip the fitness evaluation and use the cached results.

Reproducibility

seed

int

default:"0"

The seed to use for the random number generator.

raise_on_exception

bool

default:"True"

Whether to propagate proposer/evaluator exceptions instead of stopping gracefully.

val_evaluation_policy

EvaluationPolicy[DataId, DataInst] | Literal['full_eval'] | None

default:"None"

Strategy controlling which validation ids to score each iteration and which candidate is currently best. Supported strings: “full_eval” (evaluate every id each time). Passing None defaults to “full_eval”.

Returns

result

GEPAResult[RolloutOutput, DataId]

A GEPAResult object containing the optimization results, including the best candidate, all explored candidates, validation scores, and Pareto frontier information.

Key Concepts

System & Candidate

System: A harness that uses text components to perform a task. Each text component of the system to be optimized is a named component of the system.
Candidate: A mapping from component names to component text. A concrete instantiation of the system is realized by setting the text of each system component to the text provided by the candidate mapping.
DataInst: An (uninterpreted) data type over which the system operates.
RolloutOutput: The output of the system on a DataInst.

Optimization Strategies

At each iteration, GEPA proposes a new candidate using one of the following strategies:

Reflective mutation: GEPA proposes a new candidate by mutating the current candidate, leveraging rich textual feedback.
Merge: GEPA proposes a new candidate by merging 2 candidates that are on the Pareto frontier.

GEPA also tracks the Pareto frontier of performance achieved by different candidates on the validation set. This way, it can leverage candidates that work well on a subset of inputs to improve the system’s performance on the entire validation set.

Example Usage

import gepa
from gepa.adapters.default_adapter import DefaultAdapter

# Define your training data
train_data = [
    {"input": "What is 2+2?", "answer": "4"},
    {"input": "What is the capital of France?", "answer": "Paris"},
]

# Define initial candidate
seed = {
    "instruction": "Answer the question accurately."
}

# Run optimization
result = gepa.optimize(
    seed_candidate=seed,
    trainset=train_data,
    task_lm="gpt-3.5-turbo",
    reflection_lm="gpt-4",
    max_metric_calls=100,
)

print(f"Best candidate: {result.best_candidate}")
print(f"Best score: {result.val_aggregate_scores[result.best_idx]}")

Core API

Adapters

Configuration

Advanced

gepa.optimize()

Overview

Function Signature

Required Parameters

Optional Parameters

Data & Evaluation

Reflection Configuration

Component Selection

Merge Configuration

Budget & Stopping

Logging & Tracking

Evaluation Caching

Reproducibility

Returns

Key Concepts

System & Candidate

Optimization Strategies

Example Usage

See Also

Build docs developers (and LLMs) love

Core API

Adapters

Configuration

Advanced

​Overview

​Function Signature

​Required Parameters

​Optional Parameters

​Data & Evaluation

​Reflection Configuration

​Component Selection

​Merge Configuration

​Budget & Stopping

​Logging & Tracking

​Evaluation Caching

​Reproducibility

​Returns

​Key Concepts

​System & Candidate

​Optimization Strategies

​Example Usage

​See Also

Build docs developers (and LLMs) love

Overview

Function Signature

Required Parameters

Optional Parameters

Data & Evaluation

Reflection Configuration

Component Selection

Merge Configuration

Budget & Stopping

Logging & Tracking

Evaluation Caching

Reproducibility

Returns

Key Concepts

System & Candidate

Optimization Strategies

Example Usage

See Also