Overview
optimize_anything() is GEPA’s universal API for optimizing any text-representable artifact: code, prompts, agent architectures, configurations, policies, SVG graphics, and more. You declare what to optimize (the artifact and evaluator) and how to measure it; optimize_anything handles the how: prompt construction, LLM reflection, candidate selection, and Pareto-efficient search.
The key insight is that a wide range of problems can be formulated as optimizing a text artifact. If it can be serialized to a string and its quality measured, an LLM can reason about it and propose improvements.
Function Signature
Three Optimization Modes
The function supports three optimization paradigms, determined by whether you providedataset and valset:
1. Single-Task Search (dataset=None, valset=None)
Solve one hard problem. The candidate is the solution. Evaluator is called without example.
Example use cases: Circle packing, blackbox mathematical optimization
2. Multi-Task Search (dataset=<list>, valset=None)
Solve a batch of related problems with cross-task transfer. Insights from solving one help solve the others. Evaluator is called per-example.
Example use cases: CUDA kernel generation for multiple PyTorch operations, multi-aspect SVG optimization
3. Generalization (dataset=<list>, valset=<list>)
Build a skill that transfers to unseen problems. Evaluator is called per-example; candidates must generalize to valset.
Example use cases: Prompt optimization for AIME math, agent architecture evolution for ARC-AGI, cloud scheduling policy discovery
Parameters
Starting point for optimization:
str: single text parameter (evaluator receivesstr)dict[str, str]: named parameters (evaluator receives the dict)None: seedless mode - the reflection LLM generates the initial candidate fromobjective. Requiresobjective.
Scoring function. Returns
(score, side_info) or just score. Higher scores are better.Signature varies by mode:- Single-task:
def evaluate(candidate) -> float | tuple[float, dict] - Multi-task/Generalization:
def evaluate(candidate, example) -> float | tuple[float, dict]
opt_state: OptimizationState parameter to receive historical best evaluations.Examples for multi-task or generalization modes.
None = single-task search mode.Held-out validation set for generalization mode.
None = defaults to dataset (multi-task search).Natural-language goal for the reflection LLM. Example: “Generate prompts that solve competition math problems.” Required when
seed_candidate=None.Domain knowledge, constraints, or strategies for the reflection LLM.
Full configuration object. If not provided, uses default settings.
Returns
Optimization result object. Access
result.best_candidate for the optimized parameter(s) and the full optimization history.Key attributes:best_candidate: The optimized parameter(s) -strordict[str, str]best_idx: Index of the best candidateval_aggregate_scores: Per-candidate average validation scorecandidates: All candidates explored during optimizationper_val_instance_best_candidates: Pareto frontier per validation example
Configuration
TheGEPAConfig dataclass groups all settings:
EngineConfig
Controls the optimization run loop:max_metric_calls(int | None): Evaluation budgetparallel(bool): Enable concurrent evaluationmax_workers(int | None): Number of parallel workerscapture_stdio(bool): Auto-capture print() output as ASIcache_evaluation(bool): Cache (candidate, example) scoresrun_dir(str | None): Directory to save optimization statedisplay_progress_bar(bool): Show tqdm progress bar
ReflectionConfig
Controls LLM-based candidate proposal:reflection_lm(LanguageModel | str | None): Model for reflection (default: “openai/gpt-5.1”)reflection_minibatch_size(int | None): Examples per reflection stepreflection_prompt_template(str | dict | None): Custom reflection promptmodule_selector(str): Component selection strategy (‘round_robin’, ‘all’)
TrackingConfig
Experiment tracking:use_wandb(bool): Enable Weights & Biasesuse_mlflow(bool): Enable MLflowlogger(LoggerProtocol | None): Custom logger
Seedless Mode
When you don’t have a starting artifact, passseed_candidate=None and provide objective (and optionally background). The reflection LM bootstraps the first candidate from the description, then iterates as usual.
Actionable Side Information (ASI)
ASI is the text-optimization analogue of the gradient. Where gradients tell a numerical optimizer which direction to move, ASI tells the LLM proposer why a candidate failed and how to fix it. You can provide ASI in two ways:- Return
(score, side_info_dict)from your evaluator:
- Call
oa.log()inside your evaluator (captured under"log"key):
Multi-Objective Scores
Include a"scores" dict in side_info for Pareto tracking (all values must be “higher is better”):
Visual Feedback
Useoa.Image for visual feedback with VLM reflection:
Key Features
Pareto-Efficient Search
Scores are tracked per-task and per-metric individually. Any candidate that is the best at something survives on the frontier, enabling focused improvements that are preserved rather than averaged away.Evaluation Caching
Enable caching to avoid redundant evaluations:Parallel Evaluation
Speed up optimization with parallel evaluation:Complete Example
See Also
- gepa.optimize() - Lower-level API with full adapter control
- GEPAResult - Result object returned by optimize_anything()
- Configuration Guide - Detailed configuration options
- Evaluation Metrics Guide - How to write effective evaluators