Skip to main content

Overview

GEPA provides flexible logging capabilities for tracking optimization runs, including:
  • File-based logging with the Logger class
  • Experiment tracking with W&B and MLflow via ExperimentTracker
  • Detailed metrics logging utilities

File Logging

Logger Class

The Logger class captures stdout and stderr to log files during optimization.
from gepa.logging.logger import Logger

with Logger("run_log.txt") as logger:
    logger.log("Starting optimization...")
    # Your optimization code here
filename
str
required
Path to the log file. A separate stderr log will be created automatically.
mode
str
default:"a"
File open mode (‘a’ for append, ‘w’ for write)

Logger Methods

log

Logs a message to the file and optionally to stdout.
logger.log("Iteration 1 complete", "Score:", 0.85)
The log method accepts the same arguments as Python’s print() function.

Context Manager

The Logger class works as a context manager, automatically redirecting stdout and stderr:
with Logger("optimization.log") as logger:
    print("This will be logged")  # Goes to both console and file
    logger.log("Explicit log message")
When used as a context manager:
  • stdout is redirected to run_log.txt
  • stderr is redirected to run_log_stderr.txt
  • Both remain visible in the terminal via the Tee class

StdOutLogger

For simple console-only logging:
from gepa.logging.logger import StdOutLogger

logger = StdOutLogger()
logger.log("Message to stdout")

Experiment Tracking

ExperimentTracker Class

The ExperimentTracker provides unified experiment tracking supporting both W&B and MLflow.
from gepa.logging.experiment_tracker import ExperimentTracker

tracker = ExperimentTracker(
    use_wandb=True,
    wandb_api_key="your-api-key",
    wandb_init_kwargs={"project": "my-project", "name": "run-1"},
)

with tracker:
    tracker.log_metrics({"score": 0.95, "iteration": 10}, step=10)

Constructor Parameters

use_wandb
bool
default:"false"
Enable Weights & Biases tracking
wandb_api_key
str | None
default:"None"
W&B API key (if not set, uses environment or prompts login)
wandb_init_kwargs
dict[str, Any] | None
default:"None"
Additional arguments passed to wandb.init()
use_mlflow
bool
default:"false"
Enable MLflow tracking
mlflow_tracking_uri
str | None
default:"None"
MLflow tracking server URI
mlflow_experiment_name
str | None
default:"None"
MLflow experiment name

Methods

initialize

Initializes the logging backends.
tracker.initialize()
Automatically called when using as a context manager.

start_run

Starts a new tracking run.
tracker.start_run()

log_metrics

Logs metrics to the active backends.
tracker.log_metrics(
    {"train_score": 0.85, "val_score": 0.82},
    step=5
)
metrics
dict[str, Any]
required
Dictionary of metric names and values
step
int | None
default:"None"
Optional step number for the metrics
For MLflow, only numeric values (int/float) are logged. Non-numeric values are filtered out automatically.

end_run

Ends the current tracking run.
tracker.end_run()
Automatically called when exiting context manager.

is_active

Checks if any backend has an active run.
if tracker.is_active():
    tracker.log_metrics({"status": "running"})

Using with optimize()

GEPA’s optimize() function has built-in experiment tracking support:
from gepa import optimize

result = optimize(
    seed_candidate={"instructions": "..."},
    trainset=train_data,
    valset=val_data,
    # W&B tracking
    use_wandb=True,
    wandb_api_key="your-key",
    wandb_init_kwargs={
        "project": "gepa-optimization",
        "name": "experiment-1",
        "tags": ["math", "prompt-opt"],
    },
    # MLflow tracking
    use_mlflow=True,
    mlflow_tracking_uri="http://localhost:5000",
    mlflow_experiment_name="prompt-optimization",
)

create_experiment_tracker

Factory function for creating experiment trackers:
from gepa.logging.experiment_tracker import create_experiment_tracker

tracker = create_experiment_tracker(
    use_wandb=True,
    wandb_init_kwargs={"project": "my-project"},
    use_mlflow=True,
    mlflow_tracking_uri="sqlite:///mlflow.db",
)

Detailed Metrics Logging

log_detailed_metrics_after_discovering_new_program

Utility function for logging comprehensive metrics when a new candidate is discovered.
from gepa.logging.utils import log_detailed_metrics_after_discovering_new_program

log_detailed_metrics_after_discovering_new_program(
    logger=logger,
    gepa_state=state,
    new_program_idx=5,
    valset_evaluation=val_eval,
    objective_scores=scores,
    experiment_tracker=tracker,
    linear_pareto_front_program_idx=3,
    valset_size=100,
    val_evaluation_policy=eval_policy,
    log_individual_valset_scores_and_programs=True,
)
This function logs:
  • Validation set scores and coverage
  • Pareto front information
  • Best program metrics
  • Individual validation scores (if enabled)
  • Multi-objective scores (if available)
logger
LoggerProtocol
required
Logger instance for output
gepa_state
GEPAState
required
Current optimization state
new_program_idx
int
required
Index of the newly discovered program
valset_evaluation
ValsetEvaluation
required
Validation set evaluation results
objective_scores
dict
required
Objective scores for the program
experiment_tracker
ExperimentTracker
required
Experiment tracker for metrics logging
linear_pareto_front_program_idx
int
required
Index of the linear Pareto front program
valset_size
int
required
Total validation set size
val_evaluation_policy
EvaluationPolicy
required
Validation evaluation policy
log_individual_valset_scores_and_programs
bool
default:"false"
Whether to log individual scores per validation example

Logged Metrics

The function logs the following metrics to the experiment tracker:
  • iteration: Current iteration number
  • new_program_idx: Index of new program
  • valset_pareto_front_agg: Pareto front aggregate score
  • valset_pareto_front_programs: Programs on Pareto front
  • best_valset_agg_score: Best aggregate validation score
  • linear_pareto_front_program_idx: Linear Pareto front index
  • best_program_as_per_agg_score_valset: Best program index
  • best_score_on_valset: Best validation score
  • val_evaluated_count_new_program: Validation examples evaluated
  • val_total_count: Total validation set size
  • val_program_average: Average validation score
  • total_metric_calls: Total metric evaluations
  • objective_scores_new_program: Multi-objective scores (if available)
  • objective_pareto_front_scores: Objective Pareto front (if available)

Example: Complete Logging Setup

from gepa import optimize
from gepa.logging.logger import Logger

with Logger("optimization_run.log") as logger:
    logger.log("Starting GEPA optimization")
    
    result = optimize(
        seed_candidate={"instructions": "Solve the problem step by step."},
        trainset=train_data,
        valset=val_data,
        logger=logger,
        run_dir="./runs/experiment-1",
        # W&B tracking
        use_wandb=True,
        wandb_init_kwargs={
            "project": "gepa-math",
            "name": "gsm8k-optimization",
            "config": {
                "dataset": "gsm8k",
                "train_size": 100,
                "val_size": 50,
            },
        },
        # MLflow tracking
        use_mlflow=True,
        mlflow_experiment_name="math-optimization",
    )
    
    logger.log(f"Best score: {result.best_score}")
    logger.log(f"Total iterations: {result.total_iterations}")

Multi-Backend Tracking

You can use both W&B and MLflow simultaneously:
tracker = ExperimentTracker(
    use_wandb=True,
    wandb_init_kwargs={"project": "my-project"},
    use_mlflow=True,
    mlflow_tracking_uri="http://localhost:5000",
    mlflow_experiment_name="my-experiment",
)

with tracker:
    # Metrics will be logged to both W&B and MLflow
    tracker.log_metrics({"score": 0.95}, step=10)

Source Reference

The logging system is implemented in:

Build docs developers (and LLMs) love