Skip to main content

Overview

optimize_anything() is GEPA’s universal API for optimizing any text-representable artifact: code, prompts, agent architectures, configurations, policies, SVG graphics, and more. You declare what to optimize (the artifact and evaluator) and how to measure it; optimize_anything handles the how: prompt construction, LLM reflection, candidate selection, and Pareto-efficient search. The key insight is that a wide range of problems can be formulated as optimizing a text artifact. If it can be serialized to a string and its quality measured, an LLM can reason about it and propose improvements.

Function Signature

def optimize_anything(
    seed_candidate: str | Candidate | None = None,
    *,
    evaluator: Callable[..., Any],
    dataset: list[DataInst] | None = None,
    valset: list[DataInst] | None = None,
    objective: str | None = None,
    background: str | None = None,
    config: GEPAConfig | None = None,
) -> GEPAResult

Three Optimization Modes

The function supports three optimization paradigms, determined by whether you provide dataset and valset:

1. Single-Task Search (dataset=None, valset=None)

Solve one hard problem. The candidate is the solution. Evaluator is called without example. Example use cases: Circle packing, blackbox mathematical optimization
import gepa.optimize_anything as oa

def evaluate(candidate: str) -> float:
    result = run_code(candidate)
    oa.log(f"Score: {result.score}")
    return result.score

result = oa.optimize_anything(
    seed_candidate="def pack_circles(): ...",
    evaluator=evaluate,
    objective="Maximize the sum of radii for n circles in a unit square.",
    config=oa.GEPAConfig(engine=oa.EngineConfig(max_metric_calls=500)),
)

2. Multi-Task Search (dataset=<list>, valset=None)

Solve a batch of related problems with cross-task transfer. Insights from solving one help solve the others. Evaluator is called per-example. Example use cases: CUDA kernel generation for multiple PyTorch operations, multi-aspect SVG optimization
result = oa.optimize_anything(
    seed_candidate={"prompt": "Write an optimized CUDA kernel."},
    evaluator=kernel_evaluator,
    dataset=kernel_problems,  # batch of related problems
    objective="Generate prompts that produce fast, correct CUDA kernels.",
    config=oa.GEPAConfig(engine=oa.EngineConfig(max_metric_calls=300)),
)

3. Generalization (dataset=<list>, valset=<list>)

Build a skill that transfers to unseen problems. Evaluator is called per-example; candidates must generalize to valset. Example use cases: Prompt optimization for AIME math, agent architecture evolution for ARC-AGI, cloud scheduling policy discovery
result = oa.optimize_anything(
    seed_candidate={"prompt": "Solve this math problem step by step:"},
    evaluator=math_evaluator,
    dataset=train_problems,  # train on these
    valset=val_problems,     # must generalize to these
    objective="Generate system prompts that improve math reasoning.",
    config=oa.GEPAConfig(engine=oa.EngineConfig(max_metric_calls=200)),
)

Parameters

seed_candidate
str | dict[str, str] | None
default:"None"
Starting point for optimization:
  • str: single text parameter (evaluator receives str)
  • dict[str, str]: named parameters (evaluator receives the dict)
  • None: seedless mode - the reflection LLM generates the initial candidate from objective. Requires objective.
evaluator
Callable[..., Any]
required
Scoring function. Returns (score, side_info) or just score. Higher scores are better.Signature varies by mode:
  • Single-task: def evaluate(candidate) -> float | tuple[float, dict]
  • Multi-task/Generalization: def evaluate(candidate, example) -> float | tuple[float, dict]
Can optionally declare opt_state: OptimizationState parameter to receive historical best evaluations.
dataset
list[DataInst] | None
default:"None"
Examples for multi-task or generalization modes. None = single-task search mode.
valset
list[DataInst] | None
default:"None"
Held-out validation set for generalization mode. None = defaults to dataset (multi-task search).
objective
str | None
default:"None"
Natural-language goal for the reflection LLM. Example: “Generate prompts that solve competition math problems.” Required when seed_candidate=None.
background
str | None
default:"None"
Domain knowledge, constraints, or strategies for the reflection LLM.
config
GEPAConfig | None
default:"None"
Full configuration object. If not provided, uses default settings.

Returns

result
GEPAResult
Optimization result object. Access result.best_candidate for the optimized parameter(s) and the full optimization history.Key attributes:
  • best_candidate: The optimized parameter(s) - str or dict[str, str]
  • best_idx: Index of the best candidate
  • val_aggregate_scores: Per-candidate average validation score
  • candidates: All candidates explored during optimization
  • per_val_instance_best_candidates: Pareto frontier per validation example

Configuration

The GEPAConfig dataclass groups all settings:
from gepa.optimize_anything import GEPAConfig, EngineConfig, ReflectionConfig

config = GEPAConfig(
    engine=EngineConfig(
        max_metric_calls=200,
        parallel=True,
        max_workers=16,
        capture_stdio=True,  # Auto-capture print() output
    ),
    reflection=ReflectionConfig(
        reflection_lm="openai/gpt-4",
        reflection_minibatch_size=3,
    ),
)

EngineConfig

Controls the optimization run loop:
  • max_metric_calls (int | None): Evaluation budget
  • parallel (bool): Enable concurrent evaluation
  • max_workers (int | None): Number of parallel workers
  • capture_stdio (bool): Auto-capture print() output as ASI
  • cache_evaluation (bool): Cache (candidate, example) scores
  • run_dir (str | None): Directory to save optimization state
  • display_progress_bar (bool): Show tqdm progress bar

ReflectionConfig

Controls LLM-based candidate proposal:
  • reflection_lm (LanguageModel | str | None): Model for reflection (default: “openai/gpt-5.1”)
  • reflection_minibatch_size (int | None): Examples per reflection step
  • reflection_prompt_template (str | dict | None): Custom reflection prompt
  • module_selector (str): Component selection strategy (‘round_robin’, ‘all’)

TrackingConfig

Experiment tracking:
  • use_wandb (bool): Enable Weights & Biases
  • use_mlflow (bool): Enable MLflow
  • logger (LoggerProtocol | None): Custom logger

Seedless Mode

When you don’t have a starting artifact, pass seed_candidate=None and provide objective (and optionally background). The reflection LM bootstraps the first candidate from the description, then iterates as usual.
result = oa.optimize_anything(
    seed_candidate=None,  # LLM writes the first draft
    evaluator=evaluate_3d_render,
    dataset=visual_aspects,
    objective="Optimize a Python program to generate a 3D unicorn.",
    background="Use build123d for CSG geometry, export to STL.",
)

Actionable Side Information (ASI)

ASI is the text-optimization analogue of the gradient. Where gradients tell a numerical optimizer which direction to move, ASI tells the LLM proposer why a candidate failed and how to fix it. You can provide ASI in two ways:
  1. Return (score, side_info_dict) from your evaluator:
def evaluate(candidate, example):
    pred = run(candidate, example["input"])
    score = 1.0 if pred == example["expected"] else 0.0
    side_info = {
        "Input": example["input"],
        "Output": pred,
        "Expected": example["expected"],
    }
    return score, side_info
  1. Call oa.log() inside your evaluator (captured under "log" key):
import gepa.optimize_anything as oa

def evaluate(candidate):
    result = run_code(candidate)
    oa.log(f"Score: {result.score}")
    oa.log(f"Overlaps: {result.overlaps}")
    return result.score

Multi-Objective Scores

Include a "scores" dict in side_info for Pareto tracking (all values must be “higher is better”):
side_info = {
    "scores": {"accuracy": 0.85, "latency_inv": 12.5},
    "Error": error_message,
}

Visual Feedback

Use oa.Image for visual feedback with VLM reflection:
from gepa.optimize_anything import Image

side_info = {
    "rendered_output": Image(path="output.png"),
}

Key Features

Scores are tracked per-task and per-metric individually. Any candidate that is the best at something survives on the frontier, enabling focused improvements that are preserved rather than averaged away.

Evaluation Caching

Enable caching to avoid redundant evaluations:
config = GEPAConfig(
    engine=EngineConfig(
        cache_evaluation=True,
        cache_evaluation_storage="disk",  # or "memory", "auto"
        run_dir="./gepa_run",
    )
)

Parallel Evaluation

Speed up optimization with parallel evaluation:
config = GEPAConfig(
    engine=EngineConfig(
        parallel=True,
        max_workers=16,
    )
)

Complete Example

import gepa.optimize_anything as oa

# Define evaluation function
def evaluate(candidate: str, example: dict) -> tuple[float, dict]:
    # Run your candidate on the example
    result = run_system(candidate, example["input"])
    
    # Calculate score (higher is better)
    score = compute_similarity(result, example["expected"])
    
    # Provide diagnostic information
    side_info = {
        "Input": example["input"],
        "Output": result,
        "Expected": example["expected"],
        "scores": {"accuracy": score, "speed": result.latency_inv},
    }
    
    return score, side_info

# Define dataset
train_data = [
    {"input": "...", "expected": "..."},
    # ... more examples
]

val_data = [
    {"input": "...", "expected": "..."},
    # ... more examples
]

# Run optimization
result = oa.optimize_anything(
    seed_candidate="Initial prompt or code",
    evaluator=evaluate,
    dataset=train_data,
    valset=val_data,
    objective="Optimize the system to maximize accuracy and speed.",
    background="System operates in a constrained environment with limited resources.",
    config=oa.GEPAConfig(
        engine=oa.EngineConfig(
            max_metric_calls=300,
            parallel=True,
            max_workers=8,
            cache_evaluation=True,
            run_dir="./optimization_run",
        ),
        reflection=oa.ReflectionConfig(
            reflection_lm="openai/gpt-4",
            reflection_minibatch_size=3,
        ),
    ),
)

# Access results
print(f"Best candidate: {result.best_candidate}")
print(f"Best score: {result.val_aggregate_scores[result.best_idx]}")
print(f"Total candidates explored: {result.num_candidates}")

See Also

Build docs developers (and LLMs) love