optimize_anything()

Overview

optimize_anything() is GEPA’s universal API for optimizing any text-representable artifact: code, prompts, agent architectures, configurations, policies, SVG graphics, and more. You declare what to optimize (the artifact and evaluator) and how to measure it; optimize_anything handles the how: prompt construction, LLM reflection, candidate selection, and Pareto-efficient search. The key insight is that a wide range of problems can be formulated as optimizing a text artifact. If it can be serialized to a string and its quality measured, an LLM can reason about it and propose improvements.

Function Signature

def optimize_anything(
    seed_candidate: str | Candidate | None = None,
    *,
    evaluator: Callable[..., Any],
    dataset: list[DataInst] | None = None,
    valset: list[DataInst] | None = None,
    objective: str | None = None,
    background: str | None = None,
    config: GEPAConfig | None = None,
) -> GEPAResult

Three Optimization Modes

The function supports three optimization paradigms, determined by whether you provide dataset and valset:

1. Single-Task Search (`dataset=None, valset=None`)

Solve one hard problem. The candidate is the solution. Evaluator is called without example. Example use cases: Circle packing, blackbox mathematical optimization

import gepa.optimize_anything as oa

def evaluate(candidate: str) -> float:
    result = run_code(candidate)
    oa.log(f"Score: {result.score}")
    return result.score

result = oa.optimize_anything(
    seed_candidate="def pack_circles(): ...",
    evaluator=evaluate,
    objective="Maximize the sum of radii for n circles in a unit square.",
    config=oa.GEPAConfig(engine=oa.EngineConfig(max_metric_calls=500)),
)

2. Multi-Task Search (`dataset=<list>, valset=None`)

Solve a batch of related problems with cross-task transfer. Insights from solving one help solve the others. Evaluator is called per-example. Example use cases: CUDA kernel generation for multiple PyTorch operations, multi-aspect SVG optimization

result = oa.optimize_anything(
    seed_candidate={"prompt": "Write an optimized CUDA kernel."},
    evaluator=kernel_evaluator,
    dataset=kernel_problems,  # batch of related problems
    objective="Generate prompts that produce fast, correct CUDA kernels.",
    config=oa.GEPAConfig(engine=oa.EngineConfig(max_metric_calls=300)),
)

3. Generalization (`dataset=<list>, valset=<list>`)

Build a skill that transfers to unseen problems. Evaluator is called per-example; candidates must generalize to valset. Example use cases: Prompt optimization for AIME math, agent architecture evolution for ARC-AGI, cloud scheduling policy discovery

result = oa.optimize_anything(
    seed_candidate={"prompt": "Solve this math problem step by step:"},
    evaluator=math_evaluator,
    dataset=train_problems,  # train on these
    valset=val_problems,     # must generalize to these
    objective="Generate system prompts that improve math reasoning.",
    config=oa.GEPAConfig(engine=oa.EngineConfig(max_metric_calls=200)),
)

Parameters

seed_candidate

str | dict[str, str] | None

default:"None"

Starting point for optimization:

str: single text parameter (evaluator receives str)
dict[str, str]: named parameters (evaluator receives the dict)
None: seedless mode - the reflection LLM generates the initial candidate from objective. Requires objective.

evaluator

Callable[..., Any]

required

Scoring function. Returns (score, side_info) or just score. Higher scores are better.Signature varies by mode:

Single-task: def evaluate(candidate) -> float | tuple[float, dict]
Multi-task/Generalization: def evaluate(candidate, example) -> float | tuple[float, dict]

Can optionally declare opt_state: OptimizationState parameter to receive historical best evaluations.

dataset

list[DataInst] | None

default:"None"

Examples for multi-task or generalization modes. None = single-task search mode.

valset

list[DataInst] | None

default:"None"

Held-out validation set for generalization mode. None = defaults to dataset (multi-task search).

objective

str | None

default:"None"

Natural-language goal for the reflection LLM. Example: “Generate prompts that solve competition math problems.” Required when seed_candidate=None.

background

str | None

default:"None"

Domain knowledge, constraints, or strategies for the reflection LLM.

config

GEPAConfig | None

default:"None"

Full configuration object. If not provided, uses default settings.

Returns

result

GEPAResult

Optimization result object. Access result.best_candidate for the optimized parameter(s) and the full optimization history.Key attributes:

best_candidate: The optimized parameter(s) - str or dict[str, str]
best_idx: Index of the best candidate
val_aggregate_scores: Per-candidate average validation score
candidates: All candidates explored during optimization
per_val_instance_best_candidates: Pareto frontier per validation example

Configuration

The GEPAConfig dataclass groups all settings:

from gepa.optimize_anything import GEPAConfig, EngineConfig, ReflectionConfig

config = GEPAConfig(
    engine=EngineConfig(
        max_metric_calls=200,
        parallel=True,
        max_workers=16,
        capture_stdio=True,  # Auto-capture print() output
    ),
    reflection=ReflectionConfig(
        reflection_lm="openai/gpt-4",
        reflection_minibatch_size=3,
    ),
)

EngineConfig

Controls the optimization run loop:

max_metric_calls (int | None): Evaluation budget
parallel (bool): Enable concurrent evaluation
max_workers (int | None): Number of parallel workers
capture_stdio (bool): Auto-capture print() output as ASI
cache_evaluation (bool): Cache (candidate, example) scores
run_dir (str | None): Directory to save optimization state
display_progress_bar (bool): Show tqdm progress bar

ReflectionConfig

Controls LLM-based candidate proposal:

reflection_lm (LanguageModel | str | None): Model for reflection (default: “openai/gpt-5.1”)
reflection_minibatch_size (int | None): Examples per reflection step
reflection_prompt_template (str | dict | None): Custom reflection prompt
module_selector (str): Component selection strategy (‘round_robin’, ‘all’)

TrackingConfig

Experiment tracking:

use_wandb (bool): Enable Weights & Biases
use_mlflow (bool): Enable MLflow
logger (LoggerProtocol | None): Custom logger

Seedless Mode

When you don’t have a starting artifact, pass seed_candidate=None and provide objective (and optionally background). The reflection LM bootstraps the first candidate from the description, then iterates as usual.

result = oa.optimize_anything(
    seed_candidate=None,  # LLM writes the first draft
    evaluator=evaluate_3d_render,
    dataset=visual_aspects,
    objective="Optimize a Python program to generate a 3D unicorn.",
    background="Use build123d for CSG geometry, export to STL.",
)

Actionable Side Information (ASI)

ASI is the text-optimization analogue of the gradient. Where gradients tell a numerical optimizer which direction to move, ASI tells the LLM proposer why a candidate failed and how to fix it. You can provide ASI in two ways:

Return (score, side_info_dict) from your evaluator:

def evaluate(candidate, example):
    pred = run(candidate, example["input"])
    score = 1.0 if pred == example["expected"] else 0.0
    side_info = {
        "Input": example["input"],
        "Output": pred,
        "Expected": example["expected"],
    }
    return score, side_info

Call oa.log() inside your evaluator (captured under "log" key):

import gepa.optimize_anything as oa

def evaluate(candidate):
    result = run_code(candidate)
    oa.log(f"Score: {result.score}")
    oa.log(f"Overlaps: {result.overlaps}")
    return result.score

Multi-Objective Scores

Include a "scores" dict in side_info for Pareto tracking (all values must be “higher is better”):

side_info = {
    "scores": {"accuracy": 0.85, "latency_inv": 12.5},
    "Error": error_message,
}

Visual Feedback

Use oa.Image for visual feedback with VLM reflection:

from gepa.optimize_anything import Image

side_info = {
    "rendered_output": Image(path="output.png"),
}

Key Features

Pareto-Efficient Search

Scores are tracked per-task and per-metric individually. Any candidate that is the best at something survives on the frontier, enabling focused improvements that are preserved rather than averaged away.

Evaluation Caching

Enable caching to avoid redundant evaluations:

config = GEPAConfig(
    engine=EngineConfig(
        cache_evaluation=True,
        cache_evaluation_storage="disk",  # or "memory", "auto"
        run_dir="./gepa_run",
    )
)

Parallel Evaluation

Speed up optimization with parallel evaluation:

config = GEPAConfig(
    engine=EngineConfig(
        parallel=True,
        max_workers=16,
    )
)

Complete Example

import gepa.optimize_anything as oa

# Define evaluation function
def evaluate(candidate: str, example: dict) -> tuple[float, dict]:
    # Run your candidate on the example
    result = run_system(candidate, example["input"])
    
    # Calculate score (higher is better)
    score = compute_similarity(result, example["expected"])
    
    # Provide diagnostic information
    side_info = {
        "Input": example["input"],
        "Output": result,
        "Expected": example["expected"],
        "scores": {"accuracy": score, "speed": result.latency_inv},
    }
    
    return score, side_info

# Define dataset
train_data = [
    {"input": "...", "expected": "..."},
    # ... more examples
]

val_data = [
    {"input": "...", "expected": "..."},
    # ... more examples
]

# Run optimization
result = oa.optimize_anything(
    seed_candidate="Initial prompt or code",
    evaluator=evaluate,
    dataset=train_data,
    valset=val_data,
    objective="Optimize the system to maximize accuracy and speed.",
    background="System operates in a constrained environment with limited resources.",
    config=oa.GEPAConfig(
        engine=oa.EngineConfig(
            max_metric_calls=300,
            parallel=True,
            max_workers=8,
            cache_evaluation=True,
            run_dir="./optimization_run",
        ),
        reflection=oa.ReflectionConfig(
            reflection_lm="openai/gpt-4",
            reflection_minibatch_size=3,
        ),
    ),
)

# Access results
print(f"Best candidate: {result.best_candidate}")
print(f"Best score: {result.val_aggregate_scores[result.best_idx]}")
print(f"Total candidates explored: {result.num_candidates}")

Core API

Adapters

Configuration

Advanced

optimize_anything()

Overview

Function Signature

Three Optimization Modes

1. Single-Task Search (`dataset=None, valset=None`)

2. Multi-Task Search (`dataset=<list>, valset=None`)

3. Generalization (`dataset=<list>, valset=<list>`)

Parameters

Returns

Configuration

EngineConfig

ReflectionConfig

TrackingConfig

Seedless Mode

Actionable Side Information (ASI)

Multi-Objective Scores

Visual Feedback

Key Features

Pareto-Efficient Search

Evaluation Caching

Parallel Evaluation

Complete Example

See Also

Build docs developers (and LLMs) love

Core API

Adapters

Configuration

Advanced

​Overview

​Function Signature

​Three Optimization Modes

​1. Single-Task Search (dataset=None, valset=None)

​2. Multi-Task Search (dataset=<list>, valset=None)

​3. Generalization (dataset=<list>, valset=<list>)

​Parameters

​Returns

​Configuration

​EngineConfig

​ReflectionConfig

​TrackingConfig

​Seedless Mode

​Actionable Side Information (ASI)

​Multi-Objective Scores

​Visual Feedback

​Key Features

​Pareto-Efficient Search

​Evaluation Caching

​Parallel Evaluation

​Complete Example

​See Also

Build docs developers (and LLMs) love

Overview

Function Signature

Three Optimization Modes

1. Single-Task Search (`dataset=None, valset=None`)

2. Multi-Task Search (`dataset=<list>, valset=None`)

3. Generalization (`dataset=<list>, valset=<list>`)

Parameters

Returns

Configuration

EngineConfig

ReflectionConfig

TrackingConfig

Seedless Mode

Actionable Side Information (ASI)

Multi-Objective Scores

Visual Feedback

Key Features

Pareto-Efficient Search

Evaluation Caching

Parallel Evaluation

Complete Example

See Also