Skip to main content

Overview

The Training Configuration page (/training) provides a comprehensive form for setting up training hyperparameters, optimizers, learning rate schedules, and callbacks. Configurations are saved to a Training Library for reuse across experiments.

Page Layout

The page uses a sidebar + main form layout:
  • Left Sidebar (25%): List of saved training configurations
  • Main Form (75%): Hyperparameter configuration interface

Training Library Sidebar

Saved Configs List

Displays all saved training configurations as a vertical list. Each entry shows:
  • Name: User-defined configuration name
  • Selection indicator: ● (filled) if selected, ○ (empty) if not
  • Delete button: 🗑️ icon to permanently remove
Actions:
  • Click name: Load configuration into form
  • Click delete: Remove from library (confirmation not required)

New Config Button

+ New Config (bottom of sidebar)
  • Clears all form fields
  • Resets to default values
  • Deselects current configuration
The sidebar processes actions BEFORE form widgets render, ensuring loaded values appear correctly in the form.

Training Configuration Form

Config Name Input

Enter a descriptive name for this training configuration:
  • Example: “Adam_Default”, “SGD_LR0.01”, “FineTune_Light”
  • Required before saving
  • Used in library list and experiment selection

Optimizer Configuration

Optimizer Selection

Choose optimization algorithm from dropdown:
Adaptive Moment Estimation
  • Adaptive learning rates per parameter
  • Combines momentum and RMSprop
  • Fast convergence, low memory overhead
  • Best for: Most scenarios, especially with limited tuning
Learning Rate: 0.001 (default)

Learning Rate

Slider: 0.0001 - 0.01 (4 decimal precision) Default values by optimizer:
  • Adam/AdamW/RMSprop: 0.001
  • SGD: 0.01 (SGD typically needs higher LR)
Too high: Training becomes unstable, loss diverges Too low: Training is very slow, may get stuck in local minima Start with defaults, then adjust based on training curves.

Learning Rate Schedule

Adjust learning rate during training to improve convergence.

Strategy Selection

Fixed learning rate throughout training.
  • No adjustments
  • Simplest approach
  • Best for: Initial experiments, short training runs
Learning rate schedules are applied automatically during training. No additional configuration needed.

Training Parameters

Max Epochs

Slider: 10 - 200
  • Default: 100
  • Maximum number of training epochs
  • Training may stop earlier with Early Stopping
Start with 100 epochs. Use Early Stopping to prevent unnecessary training if convergence occurs sooner.

Batch Size

Dropdown: 16, 32, 64, 128
  • Default: 32
  • Number of samples per gradient update
Tradeoffs:
  • Smaller (16): More gradient updates, slower, better generalization, less GPU memory
  • Larger (128): Fewer updates, faster, may generalize worse, more GPU memory
If you encounter GPU out-of-memory errors, reduce batch size to 16. If training is slow, increase to 64 or 128.

Shuffle Data

Checkbox: Shuffle training data each epoch
  • Default: Enabled
  • Randomizes sample order to prevent order bias
  • Always leave enabled unless you have specific reasons

Regularization

Techniques to prevent overfitting by constraining model complexity.

L2 Weight Decay

Checkbox: Enable L2 regularization
  • Adds penalty for large weights to loss function
  • Encourages smaller, more distributed weights
  • Recommended for most scenarios
Lambda Slider: 0.0001 - 0.01 (4 decimal precision)
  • Default: 0.0001
  • Regularization strength
  • Higher = stronger penalty
L2 weight decay is integrated into optimizers (AdamW uses decoupled weight decay for better performance).

Class Imbalance Handling

Select method to handle imbalanced classes during training.
If Dataset Configuration used “Auto Class Weights” for imbalance handling, use the same here for consistency.

Callbacks

Automated actions during training.

Early Stopping

Checkbox: Enable early stopping
  • Default: Enabled
  • Stops training when validation metric stops improving
  • Prevents overfitting and wasted compute
Patience Slider: 5 - 30 epochs
  • Default: 10
  • Number of epochs with no improvement before stopping
  • Higher patience = more tolerance for plateaus
How it works: Monitors validation loss. If loss doesn’t improve for patience epochs, training stops and the best model weights are restored.
Patience Guidelines:
  • Fast-converging models (Transfer Learning): 5-10 epochs
  • CNNs from scratch: 10-20 epochs
  • Transformers: 15-30 epochs

Model Checkpointing

Checkbox: Enable model checkpointing
  • Default: Enabled
  • Saves best model weights during training
  • Allows recovery of best model even if training continues past optimal point
Save Best By: Radio selection
  • Val Loss (default): Save when validation loss decreases
  • Val Accuracy: Save when validation accuracy increases
Checkpoints are saved to repo/models/checkpoints/{experiment_id}/best_model.pt
Use Val Loss for most cases. Use Val Accuracy if you prioritize accuracy over loss minimization.

Configuration Summary

The form builds a configuration dictionary:
{
  "optimizer": "Adam",
  "learning_rate": 0.001,
  "lr_strategy": "ReduceLROnPlateau",
  "epochs": 100,
  "batch_size": 32,
  "shuffle": true,
  "l2_decay": true,
  "l2_lambda": 0.0001,
  "class_weights": "Auto Class Weights",
  "early_stopping": true,
  "es_patience": 10,
  "checkpointing": true,
  "checkpoint_metric": "Val Loss"
}

Saving Configurations

Save Buttons

Two buttons appear at the bottom:
Primary button (when not editing)
  • Saves as new entry in library
  • Generates unique ID
  • Clears form after save
  • Shows success toast
Validation:
  • Config name must not be empty
  • All fields must have valid values
On Success:
  • ✅ Success message: “Config '' saved!”
  • Library updates with new/updated entry
  • Form remains for further edits or clears (Save as New)
Create multiple configurations (e.g., “Fast”, “Balanced”, “Thorough”) with different epoch counts and patience values for quick experiment setup.

Example Configurations

Name: "Quick_Experiment"
Optimizer: Adam
Learning Rate: 0.001
LR Strategy: Constant
Epochs: 50
Batch Size: 64
L2 Decay: No
Class Weights: Auto
Early Stopping: Yes, Patience 5
Checkpointing: Yes, Val Loss
Use case: Fast iteration during development
Name: "Adam_Default"
Optimizer: Adam
Learning Rate: 0.001
LR Strategy: ReduceLROnPlateau
Epochs: 100
Batch Size: 32
L2 Decay: Yes (0.0001)
Class Weights: Auto
Early Stopping: Yes, Patience 10
Checkpointing: Yes, Val Loss
Use case: Most scenarios, good starting point
Name: "FineTune_ResNet"
Optimizer: SGD with Momentum
Learning Rate: 0.01
LR Strategy: ReduceLROnPlateau
Epochs: 50
Batch Size: 32
L2 Decay: Yes (0.0001)
Class Weights: Auto
Early Stopping: Yes, Patience 8
Checkpointing: Yes, Val Loss
Use case: Fine-tuning pretrained models
Name: "Transformer_Adam"
Optimizer: AdamW
Learning Rate: 0.0001
LR Strategy: Cosine Annealing
Epochs: 200
Batch Size: 16
L2 Decay: Yes (0.01)
Class Weights: Auto
Early Stopping: Yes, Patience 20
Checkpointing: Yes, Val Loss
Use case: Training transformers from scratch
Name: "FocalLoss_Imbalanced"
Optimizer: Adam
Learning Rate: 0.001
LR Strategy: ReduceLROnPlateau
Epochs: 100
Batch Size: 32
L2 Decay: Yes (0.0001)
Class Weights: Focal Loss
Early Stopping: Yes, Patience 15
Checkpointing: Yes, Val Loss
Use case: Highly imbalanced datasets (>10:1 ratio)

Tips & Best Practices

Start with Defaults: The default configuration (Adam, 0.001 LR, ReduceLROnPlateau) works well for most scenarios.
Early Stopping is Essential: Always enable early stopping to prevent overfitting and wasted compute.
Batch Size vs GPU Memory: If you get OOM errors, reduce batch size. If training is slow, increase batch size.
Don’t set learning rate too high (>0.01 for Adam). This causes training instability and divergence.
ReduceLROnPlateau for Most Cases: It automatically adapts LR based on validation performance without manual tuning.

Next Steps

After saving your training configuration:

Training Monitor

Compose experiments and start training with real-time monitoring

Build docs developers (and LLMs) love