Overview
GEPA provides flexible logging capabilities for tracking optimization runs, including:- File-based logging with the
Loggerclass - Experiment tracking with W&B and MLflow via
ExperimentTracker - Detailed metrics logging utilities
File Logging
Logger Class
TheLogger class captures stdout and stderr to log files during optimization.
Path to the log file. A separate stderr log will be created automatically.
File open mode (‘a’ for append, ‘w’ for write)
Logger Methods
log
Logs a message to the file and optionally to stdout.log method accepts the same arguments as Python’s print() function.
Context Manager
TheLogger class works as a context manager, automatically redirecting stdout and stderr:
stdoutis redirected torun_log.txtstderris redirected torun_log_stderr.txt- Both remain visible in the terminal via the
Teeclass
StdOutLogger
For simple console-only logging:Experiment Tracking
ExperimentTracker Class
TheExperimentTracker provides unified experiment tracking supporting both W&B and MLflow.
Constructor Parameters
Enable Weights & Biases tracking
W&B API key (if not set, uses environment or prompts login)
Additional arguments passed to
wandb.init()Enable MLflow tracking
MLflow tracking server URI
MLflow experiment name
Methods
initialize
Initializes the logging backends.start_run
Starts a new tracking run.log_metrics
Logs metrics to the active backends.Dictionary of metric names and values
Optional step number for the metrics
For MLflow, only numeric values (int/float) are logged. Non-numeric values are filtered out automatically.
end_run
Ends the current tracking run.is_active
Checks if any backend has an active run.Using with optimize()
GEPA’soptimize() function has built-in experiment tracking support:
create_experiment_tracker
Factory function for creating experiment trackers:Detailed Metrics Logging
log_detailed_metrics_after_discovering_new_program
Utility function for logging comprehensive metrics when a new candidate is discovered.- Validation set scores and coverage
- Pareto front information
- Best program metrics
- Individual validation scores (if enabled)
- Multi-objective scores (if available)
Logger instance for output
Current optimization state
Index of the newly discovered program
Validation set evaluation results
Objective scores for the program
Experiment tracker for metrics logging
Index of the linear Pareto front program
Total validation set size
Validation evaluation policy
Whether to log individual scores per validation example
Logged Metrics
The function logs the following metrics to the experiment tracker:iteration: Current iteration numbernew_program_idx: Index of new programvalset_pareto_front_agg: Pareto front aggregate scorevalset_pareto_front_programs: Programs on Pareto frontbest_valset_agg_score: Best aggregate validation scorelinear_pareto_front_program_idx: Linear Pareto front indexbest_program_as_per_agg_score_valset: Best program indexbest_score_on_valset: Best validation scoreval_evaluated_count_new_program: Validation examples evaluatedval_total_count: Total validation set sizeval_program_average: Average validation scoretotal_metric_calls: Total metric evaluationsobjective_scores_new_program: Multi-objective scores (if available)objective_pareto_front_scores: Objective Pareto front (if available)