Training Configuration

Overview

The Training Configuration page (/training) provides a comprehensive form for setting up training hyperparameters, optimizers, learning rate schedules, and callbacks. Configurations are saved to a Training Library for reuse across experiments.

Page Layout

The page uses a sidebar + main form layout:

Left Sidebar (25%): List of saved training configurations
Main Form (75%): Hyperparameter configuration interface

Saved Configs List

Displays all saved training configurations as a vertical list. Each entry shows:

Name: User-defined configuration name
Selection indicator: ● (filled) if selected, ○ (empty) if not
Delete button: 🗑️ icon to permanently remove

Actions:

Click name: Load configuration into form
Click delete: Remove from library (confirmation not required)

New Config Button

+ New Config (bottom of sidebar)

Clears all form fields
Resets to default values
Deselects current configuration

The sidebar processes actions BEFORE form widgets render, ensuring loaded values appear correctly in the form.

Training Configuration Form

Config Name Input

Enter a descriptive name for this training configuration:

Example: “Adam_Default”, “SGD_LR0.01”, “FineTune_Light”
Required before saving
Used in library list and experiment selection

Optimizer Configuration

Optimizer Selection

Choose optimization algorithm from dropdown:

Adam (Default)
AdamW
SGD with Momentum
RMSprop

Adaptive Moment Estimation

Adaptive learning rates per parameter
Combines momentum and RMSprop
Fast convergence, low memory overhead
Best for: Most scenarios, especially with limited tuning

Learning Rate: 0.001 (default)

Learning Rate

Slider: 0.0001 - 0.01 (4 decimal precision) Default values by optimizer:

Adam/AdamW/RMSprop: 0.001
SGD: 0.01 (SGD typically needs higher LR)

Too high: Training becomes unstable, loss diverges Too low: Training is very slow, may get stuck in local minima Start with defaults, then adjust based on training curves.

Learning Rate Schedule

Adjust learning rate during training to improve convergence.

Strategy Selection

Constant (Default)
ReduceLROnPlateau
Cosine Annealing

Fixed learning rate throughout training.

No adjustments
Simplest approach
Best for: Initial experiments, short training runs

Reduce LR when validation metric plateaus.

Monitors validation loss
Reduces LR by factor (e.g., 0.1x) after patience epochs with no improvement
Helps escape plateaus
Best for: Most scenarios, automatic tuning

PyTorch Config:

ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=5)

Smooth cosine decay to minimum LR.

LR follows cosine curve from initial to minimum
Restarts can help escape local minima
Popular for transformers and long training
Best for: Known training length, transformers

PyTorch Config:

CosineAnnealingLR(optimizer, T_max=epochs, eta_min=1e-6)

Learning rate schedules are applied automatically during training. No additional configuration needed.

Training Parameters

Max Epochs

Slider: 10 - 200

Default: 100
Maximum number of training epochs
Training may stop earlier with Early Stopping

Start with 100 epochs. Use Early Stopping to prevent unnecessary training if convergence occurs sooner.

Batch Size

Dropdown: 16, 32, 64, 128

Default: 32
Number of samples per gradient update

Tradeoffs:

Smaller (16): More gradient updates, slower, better generalization, less GPU memory
Larger (128): Fewer updates, faster, may generalize worse, more GPU memory

If you encounter GPU out-of-memory errors, reduce batch size to 16. If training is slow, increase to 64 or 128.

Shuffle Data

Checkbox: Shuffle training data each epoch

Default: Enabled
Randomizes sample order to prevent order bias
Always leave enabled unless you have specific reasons

Regularization

Techniques to prevent overfitting by constraining model complexity.

L2 Weight Decay

Checkbox: Enable L2 regularization

Adds penalty for large weights to loss function
Encourages smaller, more distributed weights
Recommended for most scenarios

Lambda Slider: 0.0001 - 0.01 (4 decimal precision)

Default: 0.0001
Regularization strength
Higher = stronger penalty

L2 weight decay is integrated into optimizers (AdamW uses decoupled weight decay for better performance).

Class Imbalance Handling

Select method to handle imbalanced classes during training.

Auto Class Weights (Recommended)
Focal Loss
None

Automatically compute class weights inversely proportional to frequency.

Loss function scales per-class losses by weight
Classes with fewer samples get higher weights
No data manipulation

Formula: weight[c] = n_samples / (n_classes * n_samples_c)Best for: Most imbalanced datasets

If Dataset Configuration used “Auto Class Weights” for imbalance handling, use the same here for consistency.

Callbacks

Automated actions during training.

Early Stopping

Checkbox: Enable early stopping

Default: Enabled
Stops training when validation metric stops improving
Prevents overfitting and wasted compute

Patience Slider: 5 - 30 epochs

Default: 10
Number of epochs with no improvement before stopping
Higher patience = more tolerance for plateaus

How it works: Monitors validation loss. If loss doesn’t improve for patience epochs, training stops and the best model weights are restored.

Patience Guidelines:

Fast-converging models (Transfer Learning): 5-10 epochs
CNNs from scratch: 10-20 epochs
Transformers: 15-30 epochs

Model Checkpointing

Checkbox: Enable model checkpointing

Default: Enabled
Saves best model weights during training
Allows recovery of best model even if training continues past optimal point

Save Best By: Radio selection

Val Loss (default): Save when validation loss decreases
Val Accuracy: Save when validation accuracy increases

Checkpoints are saved to repo/models/checkpoints/{experiment_id}/best_model.pt

Use Val Loss for most cases. Use Val Accuracy if you prioritize accuracy over loss minimization.

Configuration Summary

The form builds a configuration dictionary:

{
  "optimizer": "Adam",
  "learning_rate": 0.001,
  "lr_strategy": "ReduceLROnPlateau",
  "epochs": 100,
  "batch_size": 32,
  "shuffle": true,
  "l2_decay": true,
  "l2_lambda": 0.0001,
  "class_weights": "Auto Class Weights",
  "early_stopping": true,
  "es_patience": 10,
  "checkpointing": true,
  "checkpoint_metric": "Val Loss"
}

Saving Configurations

Save Buttons

Two buttons appear at the bottom:

Save as New
Update

Primary button (when not editing)

Saves as new entry in library
Generates unique ID
Clears form after save
Shows success toast

Validation:

Config name must not be empty
All fields must have valid values

On Success:

✅ Success message: “Config '' saved!”
Library updates with new/updated entry
Form remains for further edits or clears (Save as New)

Create multiple configurations (e.g., “Fast”, “Balanced”, “Thorough”) with different epoch counts and patience values for quick experiment setup.

Example Configurations

Quick Experiment (Fast Training)

Name: "Quick_Experiment"
Optimizer: Adam
Learning Rate: 0.001
LR Strategy: Constant
Epochs: 50
Batch Size: 64
L2 Decay: No
Class Weights: Auto
Early Stopping: Yes, Patience 5
Checkpointing: Yes, Val Loss

Use case: Fast iteration during development

Standard Training (Balanced)

Name: "Adam_Default"
Optimizer: Adam
Learning Rate: 0.001
LR Strategy: ReduceLROnPlateau
Epochs: 100
Batch Size: 32
L2 Decay: Yes (0.0001)
Class Weights: Auto
Early Stopping: Yes, Patience 10
Checkpointing: Yes, Val Loss

Use case: Most scenarios, good starting point

Transfer Learning Fine-Tuning

Name: "FineTune_ResNet"
Optimizer: SGD with Momentum
Learning Rate: 0.01
LR Strategy: ReduceLROnPlateau
Epochs: 50
Batch Size: 32
L2 Decay: Yes (0.0001)
Class Weights: Auto
Early Stopping: Yes, Patience 8
Checkpointing: Yes, Val Loss

Use case: Fine-tuning pretrained models

Transformer Training (High Capacity)

Name: "Transformer_Adam"
Optimizer: AdamW
Learning Rate: 0.0001
LR Strategy: Cosine Annealing
Epochs: 200
Batch Size: 16
L2 Decay: Yes (0.01)
Class Weights: Auto
Early Stopping: Yes, Patience 20
Checkpointing: Yes, Val Loss

Use case: Training transformers from scratch

Severe Imbalance Handling

Name: "FocalLoss_Imbalanced"
Optimizer: Adam
Learning Rate: 0.001
LR Strategy: ReduceLROnPlateau
Epochs: 100
Batch Size: 32
L2 Decay: Yes (0.0001)
Class Weights: Focal Loss
Early Stopping: Yes, Patience 15
Checkpointing: Yes, Val Loss

Use case: Highly imbalanced datasets (>10:1 ratio)

Tips & Best Practices

Start with Defaults: The default configuration (Adam, 0.001 LR, ReduceLROnPlateau) works well for most scenarios.

Early Stopping is Essential: Always enable early stopping to prevent overfitting and wasted compute.

Batch Size vs GPU Memory: If you get OOM errors, reduce batch size. If training is slow, increase batch size.

Don’t set learning rate too high (>0.01 for Adam). This causes training instability and divergence.

ReduceLROnPlateau for Most Cases: It automatically adapts LR based on validation performance without manual tuning.

Next Steps

After saving your training configuration:

Training Monitor

Compose experiments and start training with real-time monitoring

Get Started

Core Concepts

Dashboard Guide

Training

Model Interpretability

Overview

Page Layout

Training Library Sidebar

Saved Configs List

New Config Button

Training Configuration Form

Config Name Input

Optimizer Configuration

Optimizer Selection

Learning Rate

Learning Rate Schedule

Strategy Selection

Training Parameters

Max Epochs

Batch Size

Shuffle Data

Regularization

L2 Weight Decay

Class Imbalance Handling

Callbacks

Early Stopping

Model Checkpointing

Configuration Summary

Saving Configurations

Save Buttons

Example Configurations

Tips & Best Practices

Next Steps

Training Monitor

Build docs developers (and LLMs) love

Get Started

Core Concepts

Dashboard Guide

Training

Model Interpretability

​Overview

​Page Layout

​Training Library Sidebar

​Saved Configs List

​New Config Button

​Training Configuration Form

​Config Name Input

​Optimizer Configuration

​Optimizer Selection

​Learning Rate

​Learning Rate Schedule

​Strategy Selection

​Training Parameters

​Max Epochs

​Batch Size

​Shuffle Data

​Regularization

​L2 Weight Decay

​Class Imbalance Handling

​Callbacks

​Early Stopping

​Model Checkpointing

​Configuration Summary

​Saving Configurations

​Save Buttons

​Example Configurations

​Tips & Best Practices

​Next Steps

Training Monitor

Build docs developers (and LLMs) love

Overview

Page Layout

Training Library Sidebar

Saved Configs List

New Config Button

Training Configuration Form

Config Name Input

Optimizer Configuration

Optimizer Selection

Learning Rate

Learning Rate Schedule

Strategy Selection

Training Parameters

Max Epochs

Batch Size

Shuffle Data

Regularization

L2 Weight Decay

Class Imbalance Handling

Callbacks

Early Stopping

Model Checkpointing

Configuration Summary

Saving Configurations

Save Buttons

Example Configurations

Tips & Best Practices

Next Steps