Learning Rate Schedulers

Learning rate schedulers adjust the learning rate during training according to predefined schedules. This helps improve convergence and prevent overshooting optimal solutions.

Base Scheduler

All schedulers (except ReduceLROnPlateau) extend the LRScheduler base class, which provides:

step() - Update learning rates for the next epoch
getLr() - Compute learning rates for current epoch
getLastLr() - Get current learning rates for all parameter groups
epoch - Get current epoch number

StepLR

Decays the learning rate by gamma every stepSize epochs. Formula: lr = baseLr * gamma^(epoch // stepSize)

import { SGD, StepLR } from 'deepbox/optim';

const optimizer = new SGD(model.parameters(), { lr: 0.1 });
const scheduler = new StepLR(optimizer, { stepSize: 30, gamma: 0.1 });

for (let epoch = 0; epoch < 100; epoch++) {
  train();
  scheduler.step();
}

Constructor

optimizer

Optimizer

required

The optimizer whose learning rate will be scheduled

options

object

required

Scheduler configuration

Show properties

stepSize

number

required

Number of epochs between learning rate decays (must be positive integer)

gamma

number

default:"0.1"

Multiplicative factor for learning rate decay (must be positive)

lastEpoch

number

default:"-1"

Index of last epoch (used for resuming training)

Example Schedule

const scheduler = new StepLR(optimizer, { stepSize: 30, gamma: 0.1 });
// Epochs 0-29:  lr = 0.1
// Epochs 30-59: lr = 0.01
// Epochs 60-89: lr = 0.001

MultiStepLR

Decays the learning rate by gamma when the epoch reaches one of the milestones.

import { MultiStepLR } from 'deepbox/optim';

const scheduler = new MultiStepLR(optimizer, {
  milestones: [30, 80],
  gamma: 0.1
});

Constructor

optimizer

Optimizer

required

The optimizer to schedule

options

object

required

Scheduler configuration

Show properties

milestones

number[]

required

List of epoch indices at which to decay the learning rate (must be strictly increasing non-negative integers)

gamma

number

default:"0.1"

Multiplicative factor for learning rate decay

lastEpoch

number

default:"-1"

Index of last epoch

Example Schedule

const scheduler = new MultiStepLR(optimizer, {
  milestones: [30, 80],
  gamma: 0.1
});
// Epochs 0-29:  lr = 0.1
// Epochs 30-79: lr = 0.01
// Epochs 80+:   lr = 0.001

ExponentialLR

Decays the learning rate exponentially every epoch. Formula: lr = baseLr * gamma^epoch

import { ExponentialLR } from 'deepbox/optim';

const scheduler = new ExponentialLR(optimizer, { gamma: 0.95 });
// lr *= 0.95 each epoch

Constructor

optimizer

Optimizer

required

The optimizer to schedule

options

object

required

Scheduler configuration

Show properties

gamma

number

required

Multiplicative factor for exponential decay (must be positive)

lastEpoch

number

default:"-1"

Index of last epoch

CosineAnnealingLR

Sets the learning rate using a cosine annealing schedule. The learning rate oscillates between the base learning rate and etaMin following a cosine curve. Formula: lr = etaMin + (baseLr - etaMin) * (1 + cos(π * epoch / T_max)) / 2

import { CosineAnnealingLR } from 'deepbox/optim';

const scheduler = new CosineAnnealingLR(optimizer, {
  T_max: 100,
  etaMin: 0.001
});

Constructor

optimizer

Optimizer

required

The optimizer to schedule

options

object

required

Scheduler configuration

Show properties

T_max

number

required

Maximum number of epochs (one cosine cycle period). Can also use tMax property name.

etaMin

number

default:"0"

Minimum learning rate (must be non-negative)

lastEpoch

number

default:"-1"

Index of last epoch

Training Example

const scheduler = new CosineAnnealingLR(optimizer, {
  T_max: 100,
  etaMin: 0.001
});

for (let epoch = 0; epoch < 100; epoch++) {
  train();
  scheduler.step();
}

OneCycleLR

Implements the 1cycle learning rate policy. The learning rate starts at maxLr/divFactor, increases to maxLr over pctStart of the training, then decreases to maxLr/finalDivFactor.

import { OneCycleLR } from 'deepbox/optim';

const scheduler = new OneCycleLR(optimizer, {
  maxLr: 0.1,
  totalSteps: 1000,
  pctStart: 0.3
});

Constructor

optimizer

Optimizer

required

The optimizer to schedule

options

object

required

Scheduler configuration

Show properties

maxLr

number

required

Maximum learning rate (must be positive)

totalSteps

number

required

Total number of training steps (must be positive integer)

pctStart

number

default:"0.3"

Percentage of cycle spent increasing learning rate (range: 0-1)

divFactor

number

default:"25"

Initial learning rate divisor: initialLr = maxLr / divFactor

finalDivFactor

number

default:"1e4"

Final learning rate divisor: minLr = maxLr / finalDivFactor

annealStrategy

'cos' | 'linear'

default:"'cos'"

Annealing strategy for decreasing phase

lastEpoch

number

default:"-1"

Index of last epoch

Training Example

const totalSteps = numEpochs * stepsPerEpoch;
const scheduler = new OneCycleLR(optimizer, {
  maxLr: 0.1,
  totalSteps: totalSteps,
  pctStart: 0.3
});

for (let epoch = 0; epoch < numEpochs; epoch++) {
  for (const batch of dataLoader) {
    train(batch);
    scheduler.step();  // Step per batch, not per epoch
  }
}

ReduceLROnPlateau

Reduces learning rate when a metric has stopped improving. Unlike other schedulers, this one requires a metric value to be passed to step().

import { ReduceLROnPlateau } from 'deepbox/optim';

const scheduler = new ReduceLROnPlateau(optimizer, {
  mode: 'min',
  factor: 0.1,
  patience: 10
});

for (let epoch = 0; epoch < 100; epoch++) {
  train();
  const valLoss = validate();
  scheduler.step(valLoss);
}

Constructor

optimizer

Optimizer

required

The optimizer to schedule

options

object

Scheduler configuration

Show properties

mode

'min' | 'max'

default:"'min'"

Whether to minimize or maximize the metric

factor

number

default:"0.1"

Factor by which to reduce learning rate (range: 0-1)

patience

number

default:"10"

Number of epochs with no improvement before reducing learning rate

threshold

number

default:"1e-4"

Threshold for measuring improvement

cooldown

number

default:"0"

Number of epochs to wait before resuming normal operation after lr reduction

minLr

number

default:"0"

Minimum learning rate (lower bound)

Methods

step

(metric: number) => void

Update learning rates based on the metric value. Reduces learning rate if no improvement for patience epochs.

getLastLr

() => number[]

Get current learning rates for all parameter groups

Training Example

const scheduler = new ReduceLROnPlateau(optimizer, {
  mode: 'min',
  factor: 0.1,
  patience: 10,
  threshold: 1e-4
});

for (let epoch = 0; epoch < numEpochs; epoch++) {
  trainLoss = train();
  valLoss = validate();
  
  // Reduce LR if validation loss plateaus
  scheduler.step(valLoss);
  
  console.log(`Epoch ${epoch}: LR = ${scheduler.getLastLr()[0]}`);
}

WarmupLR

Linearly increases the learning rate from 0 to the base learning rate over warmupEpochs, then delegates to a wrapped scheduler.

import { WarmupLR, CosineAnnealingLR } from 'deepbox/optim';

const baseScheduler = new CosineAnnealingLR(optimizer, { T_max: 100 });
const scheduler = new WarmupLR(optimizer, baseScheduler, {
  warmupEpochs: 5
});

Constructor

optimizer

Optimizer

required

The optimizer to schedule

afterScheduler

LRScheduler | null

required

Scheduler to use after warmup period completes. Pass null to maintain base learning rate after warmup.

options

object

required

Scheduler configuration

Show properties

warmupEpochs

number

required

Number of epochs for warmup (must be positive integer)

lastEpoch

number

default:"-1"

Index of last epoch

Training Example

// Warmup for 5 epochs, then cosine annealing
const baseScheduler = new CosineAnnealingLR(optimizer, { T_max: 95 });
const scheduler = new WarmupLR(optimizer, baseScheduler, {
  warmupEpochs: 5
});

for (let epoch = 0; epoch < 100; epoch++) {
  train();
  scheduler.step();
}

Warmup without Base Scheduler

// Just warmup, no scheduler after
const scheduler = new WarmupLR(optimizer, null, { warmupEpochs: 5 });

Usage Patterns

Basic Usage

import { SGD, StepLR } from 'deepbox/optim';

const optimizer = new SGD(model.parameters(), { lr: 0.1 });
const scheduler = new StepLR(optimizer, { stepSize: 10, gamma: 0.1 });

for (let epoch = 0; epoch < 100; epoch++) {
  // Training loop
  for (const batch of dataLoader) {
    optimizer.zeroGrad();
    const loss = computeLoss(batch);
    loss.backward();
    optimizer.step();
  }
  
  // Step scheduler after each epoch
  scheduler.step();
}

Combining Schedulers with Warmup

import { AdamW, WarmupLR, CosineAnnealingLR } from 'deepbox/optim';

const optimizer = new AdamW(model.parameters(), { lr: 0.001 });
const cosineScheduler = new CosineAnnealingLR(optimizer, {
  T_max: 95,
  etaMin: 1e-6
});
const scheduler = new WarmupLR(optimizer, cosineScheduler, {
  warmupEpochs: 5
});

for (let epoch = 0; epoch < 100; epoch++) {
  train();
  scheduler.step();
}

Metric-Based Scheduling

import { Adam, ReduceLROnPlateau } from 'deepbox/optim';

const optimizer = new Adam(model.parameters(), { lr: 0.001 });
const scheduler = new ReduceLROnPlateau(optimizer, {
  mode: 'min',
  factor: 0.5,
  patience: 5,
  minLr: 1e-6
});

for (let epoch = 0; epoch < 100; epoch++) {
  trainLoss = train();
  valLoss = validate();
  
  // Scheduler monitors validation loss
  scheduler.step(valLoss);
  
  const currentLr = scheduler.getLastLr()[0];
  console.log(`Epoch ${epoch}: val_loss=${valLoss}, lr=${currentLr}`);
}

Resuming Training

const optimizer = new SGD(model.parameters(), { lr: 0.1 });
const scheduler = new StepLR(optimizer, {
  stepSize: 10,
  gamma: 0.1,
  lastEpoch: 49  // Resume from epoch 50
});

for (let epoch = 50; epoch < 100; epoch++) {
  train();
  scheduler.step();
}

Choosing a Scheduler

StepLR - Simple and effective, good for initial experiments
MultiStepLR - More control over when to decay, common in computer vision
ExponentialLR - Smooth exponential decay
CosineAnnealingLR - Popular for transformers and modern architectures
OneCycleLR - Fast convergence with super-convergence phenomenon
ReduceLROnPlateau - Adaptive based on validation metrics
WarmupLR - Essential for training large models (transformers, vision transformers)

NDArray

DataFrame

Linear Algebra

Statistics

Machine Learning

Neural Networks

Optimization

Preprocessing

Metrics

Random

Plotting

Datasets

​Base Scheduler

​StepLR

​Constructor

​Example Schedule

​MultiStepLR

​Constructor

​Example Schedule

​ExponentialLR

​Constructor

​CosineAnnealingLR

​Constructor

​Training Example

​OneCycleLR

​Constructor

​Training Example

​ReduceLROnPlateau

​Constructor

​Methods

​Training Example

​WarmupLR

​Constructor

​Training Example

​Warmup without Base Scheduler

​Usage Patterns

​Basic Usage

​Combining Schedulers with Warmup

​Metric-Based Scheduling

​Resuming Training

​Choosing a Scheduler

Build docs developers (and LLMs) love

Base Scheduler

StepLR

Constructor

Example Schedule

MultiStepLR

Constructor

Example Schedule

ExponentialLR

Constructor

CosineAnnealingLR

Constructor

Training Example

OneCycleLR

Constructor

Training Example

ReduceLROnPlateau

Constructor

Methods

Training Example

WarmupLR

Constructor

Training Example

Warmup without Base Scheduler

Usage Patterns

Basic Usage

Combining Schedulers with Warmup

Metric-Based Scheduling

Resuming Training

Choosing a Scheduler