Optimize Anything API

The optimize_anything API is GEPA’s primary entry point for optimizing any text-representable artifact using LLM-guided search. Whether you’re optimizing prompts, code, configurations, or agent architectures, this unified API handles the entire optimization workflow.

Core Concept

The key insight behind optimize_anything is that a wide range of problems can be formulated as optimizing a text artifact:

Speeding up a CUDA kernel
Tuning a scheduling policy
Refining a prompt template
Redesigning an agent architecture

If it can be serialized to a string and its quality measured, an LLM can reason about it and propose improvements.

Optimization Workflow

seed_candidate → evaluate → reflect on ASI → propose → repeat
                    ↑                            |
                    └────────────────────────────┘

Actionable Side Information (ASI) is the text-optimization analogue of the gradient. Where gradients tell a numerical optimizer which direction to move, ASI tells the LLM proposer why a candidate failed and how to fix it.

Three Optimization Modes

optimize_anything automatically detects which mode to use based on whether you provide dataset and valset:

1. Single-Task Search

When: dataset=None, valset=None Use case: Solve one hard problem where the candidate is the solution. Example: Circle packing, blackbox mathematical optimization

import gepa.optimize_anything as oa
from gepa.optimize_anything import optimize_anything, GEPAConfig, EngineConfig

def evaluate(candidate: str) -> float:
    result = run_code(candidate)
    oa.log(f"Score: {result.score}, Overlaps: {result.overlaps}")
    return result.score

result = optimize_anything(
    seed_candidate="def pack_circles(): ...",
    evaluator=evaluate,
    objective="Maximize the sum of radii for n circles in a unit square.",
    config=GEPAConfig(engine=EngineConfig(max_metric_calls=500)),
)

print(result.best_candidate)

2. Multi-Task Search

When: dataset=<list>, valset=None Use case: Solve a batch of related problems with cross-task transfer. Insights from solving one help solve the others. Example: CUDA kernel generation for multiple PyTorch operations

result = optimize_anything(
    seed_candidate={"prompt": "Write an optimized CUDA kernel."},
    evaluator=kernel_evaluator,
    dataset=kernel_problems,       # batch of related problems
    objective="Generate prompts that produce fast, correct CUDA kernels.",
    config=GEPAConfig(engine=EngineConfig(max_metric_calls=300)),
)

3. Generalization Mode

When: dataset=<list>, valset=<list> Use case: Build a skill that transfers to unseen problems. Example: Prompt optimization for AIME math, agent architecture evolution for ARC-AGI

result = optimize_anything(
    seed_candidate={"prompt": "Solve this math problem step by step:"},
    evaluator=math_evaluator,
    dataset=train_problems,        # train on these
    valset=val_problems,           # must generalize to these
    objective="Generate system prompts that improve math reasoning.",
    config=GEPAConfig(engine=EngineConfig(max_metric_calls=200)),
)

API Reference

Main Function

optimize_anything(
    seed_candidate: str | Candidate | None = None,
    *,
    evaluator: Callable[..., Any],
    dataset: list[DataInst] | None = None,
    valset: list[DataInst] | None = None,
    objective: str | None = None,
    background: str | None = None,
    config: GEPAConfig | None = None,
) -> GEPAResult

Parameters

seed_candidate

Starting point for optimization:

str — Single text parameter (evaluator receives str)
dict[str, str] — Named parameters (evaluator receives the dict)
None — Seedless mode: LLM generates initial candidate from objective

# String candidate
seed_candidate="def optimize_function(): ..."

# Dictionary candidate
seed_candidate={
    "system_prompt": "You are a helpful assistant.",
    "user_template": "Answer this question: {question}"
}

# Seedless mode
seed_candidate=None  # Requires objective

evaluator

Scoring function that returns (score, side_info) or just score.Higher scores are better.

def evaluator(candidate: str, example: dict) -> tuple[float, dict]:
    result = run_test(candidate, example)
    side_info = {
        "Input": example["input"],
        "Output": result.output,
        "Expected": example["expected"],
        "Error": result.error if result.error else None,
    }
    return result.score, side_info

See Evaluation Metrics for detailed examples.

dataset & valset

dataset: Training examples for multi-task or generalization modes
valset: Held-out validation set (defaults to dataset if not provided)

dataset = [
    {"input": "What is ML?", "expected": "Machine Learning..."},
    {"input": "Explain AI", "expected": "Artificial Intelligence..."},
]
valset = [...held-out examples...]

objective & background

objective: Natural-language goal for the reflection LLM
background: Domain knowledge, constraints, or strategies

objective="Generate prompts that solve competition math problems."
background="""
The solution should:
- Show step-by-step reasoning
- Use clear mathematical notation
- Verify the final answer
"""

config

Full configuration object. See Configuration for details.

config = GEPAConfig(
    engine=EngineConfig(
        max_metric_calls=300,
        parallel=True,
        max_workers=8,
        capture_stdio=True,
    ),
    reflection=ReflectionConfig(
        reflection_lm="openai/gpt-5.1",
        reflection_minibatch_size=3,
    ),
)

Seedless Mode

When you don’t have a starting artifact, pass seed_candidate=None and provide objective. The reflection LM bootstraps the first candidate from the description.

result = optimize_anything(
    seed_candidate=None,           # LLM writes the first draft
    evaluator=evaluate_3d_render,
    dataset=visual_aspects,
    objective="Optimize a Python program to generate a 3D unicorn.",
    background="Use build123d for CSG geometry, export to STL, render with pyrender.",
)

Seedless mode requires objective to be set. The reflection LLM needs the objective to generate an initial candidate.

Logging and Diagnostics

Using oa.log()

import gepa.optimize_anything as oa

def evaluate(candidate: str) -> float:
    result = run_code(candidate)
    oa.log("Execution time:", result.time_ms, "ms")
    oa.log("Memory usage:", result.memory_mb, "MB")
    if result.error:
        oa.log("ERROR:", result.error)
    return result.score

All oa.log() output is automatically captured and included in side_info under the "log" key.

Auto-Capturing stdout/stderr

config = GEPAConfig(
    engine=EngineConfig(
        capture_stdio=True,  # Capture print() output automatically
    ),
)

def evaluate(candidate: str) -> float:
    print("Testing candidate...")  # Captured automatically
    result = run_code(candidate)
    return result.score

Multi-Threading Support

import threading
import gepa.optimize_anything as oa

def my_evaluator(candidate):
    ctx = oa.get_log_context()

    def worker():
        oa.set_log_context(ctx)  # Propagate to child thread
        oa.log("from child thread")

    t = threading.Thread(target=worker)
    t.start()
    t.join()
    oa.log("from main evaluator thread")
    return score

Side Information (ASI)

ASI is the text-optimization analogue of the gradient. More informative SideInfo → better optimization.

Structure

side_info = {
    # Multi-objective metrics (all "higher is better")
    "scores": {
        "accuracy": 0.85,
        "latency_inv": 12.5,  # Inverse latency for "higher is better"
    },
    
    # Contextual fields
    "Input": "Translate 'Hello world' to French",
    "Output": "Salut monde",
    "Expected": "Bonjour le monde",
    "Feedback": "Translation is too informal for the context",
    
    # Parameter-specific info
    "system_prompt_specific_info": {
        "scores": {"tone": 0.3},
        "Analysis": "System prompt led to overly casual translation",
    },
}

Including Images

Use gepa.Image for visual feedback (requires VLM as reflection_lm):

from gepa import Image

def evaluate(candidate: str) -> tuple[float, dict]:
    svg_output = render_candidate(candidate)
    score = compute_score(svg_output)
    
    side_info = {
        "rendered_image": Image.from_path("output.png"),
        "score": score,
    }
    return score, side_info

Result Object

result = optimize_anything(...)

# Best candidate
print(result.best_candidate)

# Best score
print(result.val_aggregate_scores[result.best_idx])

# All candidates tried
for idx, candidate in enumerate(result.prog_candidates):
    score = result.val_aggregate_scores[idx]
    print(f"Candidate {idx}: {score}")

# Optimization history
print(f"Total evaluations: {result.total_metric_calls}")
print(f"Number of candidates: {len(result.prog_candidates)}")

Best Practices

Always provide informative side_info. Include error messages, expected vs actual output, and diagnostic information. The more context you provide, the better the LLM can propose improvements.

Start with a small evaluation budget (e.g., max_metric_calls=50) to test your setup
Use oa.log() liberally to capture diagnostic information
Structure your side_info consistently across evaluations
For multi-objective optimization, include "scores" dict with all metrics
Enable parallelization for faster optimization when evaluations are independent

Common Patterns

Evaluator with OptimizationState

Warm-start from previous best solutions:

from gepa.optimize_anything import OptimizationState

def evaluator(candidate, example, opt_state: OptimizationState):
    # Access historical best evaluations
    if opt_state.best_example_evals:
        prev_best = opt_state.best_example_evals[0]["side_info"]
        # Use prev_best to warm-start current evaluation
    
    result = run_test(candidate, example)
    return result.score

Caching Evaluations

config = GEPAConfig(
    engine=EngineConfig(
        cache_evaluation=True,
        cache_evaluation_storage="disk",  # or "memory"
        run_dir="./optimization_run",
    ),
)

Perfect Score Early Stopping

config = GEPAConfig(
    reflection=ReflectionConfig(
        skip_perfect_score=True,
        perfect_score=1.0,
    ),
)

Next Steps

DSPy Integration

Learn how to optimize DSPy programs

Custom Adapters

Create adapters for your own systems

Configuration

Explore all configuration options

Evaluation Metrics

Design effective evaluation functions

Get Started

Core Concepts

Guides

Use Cases

Optimize Anything API

Core Concept

Optimization Workflow

Three Optimization Modes

1. Single-Task Search

2. Multi-Task Search

3. Generalization Mode

API Reference

Main Function

Parameters

Seedless Mode

Logging and Diagnostics

Using oa.log()

Auto-Capturing stdout/stderr

Multi-Threading Support

Side Information (ASI)

Structure

Including Images

Result Object

Best Practices

Common Patterns

Evaluator with OptimizationState

Caching Evaluations

Perfect Score Early Stopping

Next Steps

DSPy Integration

Custom Adapters

Configuration

Evaluation Metrics

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Use Cases

​Core Concept

​Optimization Workflow

​Three Optimization Modes

​1. Single-Task Search

​2. Multi-Task Search

​3. Generalization Mode

​API Reference

​Main Function

​Parameters

​Seedless Mode

​Logging and Diagnostics

​Using oa.log()

​Auto-Capturing stdout/stderr

​Multi-Threading Support

​Side Information (ASI)

​Structure

​Including Images

​Result Object

​Best Practices

​Common Patterns

​Evaluator with OptimizationState

​Caching Evaluations

​Perfect Score Early Stopping

​Next Steps

DSPy Integration

Custom Adapters

Configuration

Evaluation Metrics

Build docs developers (and LLMs) love

Core Concept

Optimization Workflow

Three Optimization Modes

1. Single-Task Search

2. Multi-Task Search

3. Generalization Mode

API Reference

Main Function

Parameters

Seedless Mode

Logging and Diagnostics

Using oa.log()

Auto-Capturing stdout/stderr

Multi-Threading Support

Side Information (ASI)

Structure

Including Images

Result Object

Best Practices

Common Patterns

Evaluator with OptimizationState

Caching Evaluations

Perfect Score Early Stopping

Next Steps