Experiment Tracking

Experiment tracking is essential for understanding what works, reproducing results, and collaborating effectively. This guide covers configuration management, experiment logging, and model registry.

Why Track Experiments?

Reproducibility

Record exact configurations, data versions, and code commits

Comparison

Compare metrics across different hyperparameters and architectures

Collaboration

Share results and insights with team members

Debugging

Diagnose training issues with detailed logs and visualizations

Configuration Management

JSON Configuration Files

The reference implementations use JSON for configuration:

conf/example.json

{
  "model_name_or_path": "google/mobilebert-uncased",
  "train_file": "./data/train.csv",
  "validation_file": "./data/val.csv",
  "output_dir": "results",
  
  "max_seq_length": 128,
  "per_device_train_batch_size": 32,
  "per_device_eval_batch_size": 32,
  "learning_rate": 5e-05,
  "num_train_epochs": 5,
  
  "eval_strategy": "steps",
  "eval_steps": 250,
  "logging_steps": 250,
  "save_steps": 250,
  
  "load_best_model_at_end": true,
  "metric_for_best_model": "eval_f1",
  "report_to": ["wandb"]
}

Loading Configuration

Use HuggingFace’s HfArgumentParser for type-safe config loading:

from transformers import HfArgumentParser, TrainingArguments
from classic_example.config import ModelArguments, DataTrainingArguments

def get_config(config_path: Path):
    parser = HfArgumentParser(
        (ModelArguments, DataTrainingArguments, TrainingArguments)
    )
    model_args, data_args, training_args = parser.parse_json_file(config_path)
    return model_args, data_args, training_args

# Load from JSON
model_args, data_args, training_args = get_config("conf/example.json")

Hydra Configuration (Alternative)

For more complex projects, use Hydra for hierarchical configuration:

config.yaml

model:
  name: bert-base-uncased
  num_labels: 2

data:
  train_file: data/train.csv
  val_file: data/val.csv
  max_length: 128

trainer:
  batch_size: 32
  learning_rate: 5e-5
  num_epochs: 5

Hydra enables config composition, command-line overrides, and multi-run sweeps for hyperparameter search.

Weights & Biases Integration

Setup

Configure W&B in your training environment:

# Install W&B
pip install wandb

# Login with API key
export WANDB_API_KEY=your_api_key_here
export WANDB_PROJECT=ml-in-production-practice

# Disable in testing
export WANDB_MODE=disabled  # or "offline"

Automatic Logging

HuggingFace Trainer integrates with W&B automatically:

from transformers import Trainer, TrainingArguments

# Enable W&B reporting
training_args = TrainingArguments(
    output_dir="results",
    report_to=["wandb"],  # Enable W&B logging
    logging_steps=100,
    eval_steps=100,
    run_name="bert-sst2-experiment",
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    compute_metrics=compute_metrics,
)

# Logs automatically sent to W&B
train_result = trainer.train()

This automatically logs:

Training and evaluation metrics
Learning rate schedule
Gradient norms
System metrics (GPU, CPU, memory)
Model checkpoints

Custom Logging

Add custom metrics and artifacts:

import wandb

# Log custom metrics
wandb.log({
    "custom_metric": 0.95,
    "epoch": epoch,
    "learning_rate": lr,
})

# Log images
wandb.log({"confusion_matrix": wandb.Image(cm_plot)})

# Log tables
table = wandb.Table(columns=["text", "prediction", "label"])
for text, pred, label in zip(texts, predictions, labels):
    table.add_data(text, pred, label)
wandb.log({"predictions": table})

Model Registry

Use W&B Artifacts to version and share models:

import wandb
from pathlib import Path

def upload_to_registry(model_name: str, model_path: Path):
    """Upload model artifacts to W&B registry."""
    with wandb.init() as _:
        art = wandb.Artifact(model_name, type="model")
        art.add_file(model_path / "config.json")
        art.add_file(model_path / "model.safetensors")
        art.add_file(model_path / "tokenizer.json")
        art.add_file(model_path / "tokenizer_config.json")
        art.add_file(model_path / "special_tokens_map.json")
        art.add_file(model_path / "README.md")
        wandb.log_artifact(art)

def load_from_registry(model_name: str, model_path: Path):
    """Download model from W&B registry."""
    with wandb.init() as run:
        artifact = run.use_artifact(model_name, type="model")
        artifact_dir = artifact.download(root=model_path)
        return artifact_dir

Usage:

# Upload model
python classic_example/cli.py upload-to-registry \
    example_model ./results

# Download model
python classic_example/cli.py load-from-registry \
    example_model:latest ./downloaded_model