Skip to main content
The optimize_anything API is GEPA’s primary entry point for optimizing any text-representable artifact using LLM-guided search. Whether you’re optimizing prompts, code, configurations, or agent architectures, this unified API handles the entire optimization workflow.

Core Concept

The key insight behind optimize_anything is that a wide range of problems can be formulated as optimizing a text artifact:
  • Speeding up a CUDA kernel
  • Tuning a scheduling policy
  • Refining a prompt template
  • Redesigning an agent architecture
If it can be serialized to a string and its quality measured, an LLM can reason about it and propose improvements.

Optimization Workflow

seed_candidate → evaluate → reflect on ASI → propose → repeat
                    ↑                            |
                    └────────────────────────────┘
Actionable Side Information (ASI) is the text-optimization analogue of the gradient. Where gradients tell a numerical optimizer which direction to move, ASI tells the LLM proposer why a candidate failed and how to fix it.

Three Optimization Modes

optimize_anything automatically detects which mode to use based on whether you provide dataset and valset: When: dataset=None, valset=None Use case: Solve one hard problem where the candidate is the solution. Example: Circle packing, blackbox mathematical optimization
import gepa.optimize_anything as oa
from gepa.optimize_anything import optimize_anything, GEPAConfig, EngineConfig

def evaluate(candidate: str) -> float:
    result = run_code(candidate)
    oa.log(f"Score: {result.score}, Overlaps: {result.overlaps}")
    return result.score

result = optimize_anything(
    seed_candidate="def pack_circles(): ...",
    evaluator=evaluate,
    objective="Maximize the sum of radii for n circles in a unit square.",
    config=GEPAConfig(engine=EngineConfig(max_metric_calls=500)),
)

print(result.best_candidate)
When: dataset=<list>, valset=None Use case: Solve a batch of related problems with cross-task transfer. Insights from solving one help solve the others. Example: CUDA kernel generation for multiple PyTorch operations
result = optimize_anything(
    seed_candidate={"prompt": "Write an optimized CUDA kernel."},
    evaluator=kernel_evaluator,
    dataset=kernel_problems,       # batch of related problems
    objective="Generate prompts that produce fast, correct CUDA kernels.",
    config=GEPAConfig(engine=EngineConfig(max_metric_calls=300)),
)

3. Generalization Mode

When: dataset=<list>, valset=<list> Use case: Build a skill that transfers to unseen problems. Example: Prompt optimization for AIME math, agent architecture evolution for ARC-AGI
result = optimize_anything(
    seed_candidate={"prompt": "Solve this math problem step by step:"},
    evaluator=math_evaluator,
    dataset=train_problems,        # train on these
    valset=val_problems,           # must generalize to these
    objective="Generate system prompts that improve math reasoning.",
    config=GEPAConfig(engine=EngineConfig(max_metric_calls=200)),
)

API Reference

Main Function

optimize_anything(
    seed_candidate: str | Candidate | None = None,
    *,
    evaluator: Callable[..., Any],
    dataset: list[DataInst] | None = None,
    valset: list[DataInst] | None = None,
    objective: str | None = None,
    background: str | None = None,
    config: GEPAConfig | None = None,
) -> GEPAResult

Parameters

1

seed_candidate

Starting point for optimization:
  • str — Single text parameter (evaluator receives str)
  • dict[str, str] — Named parameters (evaluator receives the dict)
  • None — Seedless mode: LLM generates initial candidate from objective
# String candidate
seed_candidate="def optimize_function(): ..."

# Dictionary candidate
seed_candidate={
    "system_prompt": "You are a helpful assistant.",
    "user_template": "Answer this question: {question}"
}

# Seedless mode
seed_candidate=None  # Requires objective
2

evaluator

Scoring function that returns (score, side_info) or just score.Higher scores are better.
def evaluator(candidate: str, example: dict) -> tuple[float, dict]:
    result = run_test(candidate, example)
    side_info = {
        "Input": example["input"],
        "Output": result.output,
        "Expected": example["expected"],
        "Error": result.error if result.error else None,
    }
    return result.score, side_info
See Evaluation Metrics for detailed examples.
3

dataset & valset

  • dataset: Training examples for multi-task or generalization modes
  • valset: Held-out validation set (defaults to dataset if not provided)
dataset = [
    {"input": "What is ML?", "expected": "Machine Learning..."},
    {"input": "Explain AI", "expected": "Artificial Intelligence..."},
]
valset = [...held-out examples...]
4

objective & background

  • objective: Natural-language goal for the reflection LLM
  • background: Domain knowledge, constraints, or strategies
objective="Generate prompts that solve competition math problems."
background="""
The solution should:
- Show step-by-step reasoning
- Use clear mathematical notation
- Verify the final answer
"""
5

config

Full configuration object. See Configuration for details.
config = GEPAConfig(
    engine=EngineConfig(
        max_metric_calls=300,
        parallel=True,
        max_workers=8,
        capture_stdio=True,
    ),
    reflection=ReflectionConfig(
        reflection_lm="openai/gpt-5.1",
        reflection_minibatch_size=3,
    ),
)

Seedless Mode

When you don’t have a starting artifact, pass seed_candidate=None and provide objective. The reflection LM bootstraps the first candidate from the description.
result = optimize_anything(
    seed_candidate=None,           # LLM writes the first draft
    evaluator=evaluate_3d_render,
    dataset=visual_aspects,
    objective="Optimize a Python program to generate a 3D unicorn.",
    background="Use build123d for CSG geometry, export to STL, render with pyrender.",
)
Seedless mode requires objective to be set. The reflection LLM needs the objective to generate an initial candidate.

Logging and Diagnostics

Using oa.log()

import gepa.optimize_anything as oa

def evaluate(candidate: str) -> float:
    result = run_code(candidate)
    oa.log("Execution time:", result.time_ms, "ms")
    oa.log("Memory usage:", result.memory_mb, "MB")
    if result.error:
        oa.log("ERROR:", result.error)
    return result.score
All oa.log() output is automatically captured and included in side_info under the "log" key.

Auto-Capturing stdout/stderr

config = GEPAConfig(
    engine=EngineConfig(
        capture_stdio=True,  # Capture print() output automatically
    ),
)

def evaluate(candidate: str) -> float:
    print("Testing candidate...")  # Captured automatically
    result = run_code(candidate)
    return result.score

Multi-Threading Support

import threading
import gepa.optimize_anything as oa

def my_evaluator(candidate):
    ctx = oa.get_log_context()

    def worker():
        oa.set_log_context(ctx)  # Propagate to child thread
        oa.log("from child thread")

    t = threading.Thread(target=worker)
    t.start()
    t.join()
    oa.log("from main evaluator thread")
    return score

Side Information (ASI)

ASI is the text-optimization analogue of the gradient. More informative SideInfo → better optimization.

Structure

side_info = {
    # Multi-objective metrics (all "higher is better")
    "scores": {
        "accuracy": 0.85,
        "latency_inv": 12.5,  # Inverse latency for "higher is better"
    },
    
    # Contextual fields
    "Input": "Translate 'Hello world' to French",
    "Output": "Salut monde",
    "Expected": "Bonjour le monde",
    "Feedback": "Translation is too informal for the context",
    
    # Parameter-specific info
    "system_prompt_specific_info": {
        "scores": {"tone": 0.3},
        "Analysis": "System prompt led to overly casual translation",
    },
}

Including Images

Use gepa.Image for visual feedback (requires VLM as reflection_lm):
from gepa import Image

def evaluate(candidate: str) -> tuple[float, dict]:
    svg_output = render_candidate(candidate)
    score = compute_score(svg_output)
    
    side_info = {
        "rendered_image": Image.from_path("output.png"),
        "score": score,
    }
    return score, side_info

Result Object

result = optimize_anything(...)

# Best candidate
print(result.best_candidate)

# Best score
print(result.val_aggregate_scores[result.best_idx])

# All candidates tried
for idx, candidate in enumerate(result.prog_candidates):
    score = result.val_aggregate_scores[idx]
    print(f"Candidate {idx}: {score}")

# Optimization history
print(f"Total evaluations: {result.total_metric_calls}")
print(f"Number of candidates: {len(result.prog_candidates)}")

Best Practices

Always provide informative side_info. Include error messages, expected vs actual output, and diagnostic information. The more context you provide, the better the LLM can propose improvements.
  1. Start with a small evaluation budget (e.g., max_metric_calls=50) to test your setup
  2. Use oa.log() liberally to capture diagnostic information
  3. Structure your side_info consistently across evaluations
  4. For multi-objective optimization, include "scores" dict with all metrics
  5. Enable parallelization for faster optimization when evaluations are independent

Common Patterns

Evaluator with OptimizationState

Warm-start from previous best solutions:
from gepa.optimize_anything import OptimizationState

def evaluator(candidate, example, opt_state: OptimizationState):
    # Access historical best evaluations
    if opt_state.best_example_evals:
        prev_best = opt_state.best_example_evals[0]["side_info"]
        # Use prev_best to warm-start current evaluation
    
    result = run_test(candidate, example)
    return result.score

Caching Evaluations

config = GEPAConfig(
    engine=EngineConfig(
        cache_evaluation=True,
        cache_evaluation_storage="disk",  # or "memory"
        run_dir="./optimization_run",
    ),
)

Perfect Score Early Stopping

config = GEPAConfig(
    reflection=ReflectionConfig(
        skip_perfect_score=True,
        perfect_score=1.0,
    ),
)

Next Steps

DSPy Integration

Learn how to optimize DSPy programs

Custom Adapters

Create adapters for your own systems

Configuration

Explore all configuration options

Evaluation Metrics

Design effective evaluation functions

Build docs developers (and LLMs) love