Overview
The Experiment Manager provides a structured way to track experiments, log metrics, manage checkpoints, and maintain experiment history with automatic versioning.ExperimentManager
Class for managing experiment lifecycle, metrics, and checkpoints.Constructor
log_dir(str): Directory where experiment logs are stored. Defaults toexperiments/logs.
Methods
start_experiment
Initiates a new experiment and creates an experiment record. Parameters:Name of the configuration used for this experiment.
Dictionary of hyperparameters for the experiment (e.g., learning rate, batch size).
Additional metadata about the experiment. The following fields are automatically added if not provided:
precision: Precision modemodel_size: Model size descriptiondataset_version: Dataset version identifierhardware_constraint_mode: Hardware constraint settings
Custom experiment ID. If not provided, derived from
config_name.log_metrics
Logs metrics for the active experiment. Parameters:metrics(dict): Dictionary mapping metric names to lists of values (e.g., per-epoch losses)
add_checkpoint
Records a checkpoint path for the active experiment. Parameters:checkpoint_path(str): File path to the saved checkpoint
read_history
Reads the complete history for an experiment ID. Parameters:experiment_id(str): Experiment identifier
ExperimentRecord
Dataclass representing a single experiment run.Fields
Unique identifier for the experiment series.
Version number, auto-incremented for each run of the same experiment.
ISO 8601 timestamp of when the experiment was created.
Name of the configuration used.
Hyperparameters used in the experiment.
Additional metadata about the experiment.
Dictionary mapping metric names to lists of values. Defaults to empty dict.
List of checkpoint file paths. Defaults to empty list.
Complete Workflow Example
Notes
- Each experiment ID maintains its own history file in JSON format
- Versions are automatically incremented when starting a new experiment with the same ID
- All metrics are converted to JSON-compatible types automatically
- The active experiment is persisted after every operation (start, log_metrics, add_checkpoint)
- Only one experiment can be active at a time per ExperimentManager instance