Optimization algorithms for training neural networks
Optimizers update model parameters based on computed gradients during training. Deepbox provides implementations of popular optimization algorithms including SGD, Adam, AdamW, and more.
All optimizers extend the abstract Optimizer base class, which provides common functionality for parameter management, gradient zeroing, and state persistence.
Adaptive Moment Estimation optimizer that computes adaptive learning rates for each parameter using running averages of gradients and their squared values.
import { Adam } from 'deepbox/optim';const optimizer = new Adam(model.parameters(), { lr: 0.001, beta1: 0.9, beta2: 0.999});
Adam with decoupled Weight decay. Fixes the weight decay implementation in Adam by applying it directly to parameters rather than including it in the gradient-based update. This leads to better generalization and is the recommended variant for most applications.
import { AdamW } from 'deepbox/optim';const optimizer = new AdamW(model.parameters(), { lr: 0.001, weightDecay: 0.01 // Typical value for AdamW});
Nesterov-accelerated Adam optimizer. Combines Adam’s adaptive learning rates with Nesterov momentum for potentially faster convergence by applying “look-ahead” gradients.
import { Nadam } from 'deepbox/optim';const optimizer = new Nadam(model.parameters(), { lr: 0.002, beta1: 0.9, beta2: 0.999});
Root Mean Square Propagation optimizer. Adapts the learning rate for each parameter by dividing by a running average of recent gradient magnitudes. Particularly effective for RNNs and non-stationary objectives.
import { RMSprop } from 'deepbox/optim';const optimizer = new RMSprop(model.parameters(), { lr: 0.01, alpha: 0.99, momentum: 0.9, centered: true});
Adaptive Gradient Algorithm. Adapts the learning rate for each parameter based on the historical sum of squared gradients. Parameters with larger gradients receive smaller effective learning rates.
import { Adagrad } from 'deepbox/optim';const optimizer = new Adagrad(model.parameters(), { lr: 0.01, eps: 1e-10});
Adaptive learning rate method that seeks to reduce Adagrad’s aggressive, monotonically decreasing learning rate. Uses a moving window of gradient updates rather than accumulating all past gradients.
import { AdaDelta } from 'deepbox/optim';const optimizer = new AdaDelta(model.parameters(), { lr: 1.0, rho: 0.9, eps: 1e-6});