mseLoss
Mean Squared Error loss function.Signature
predictions- Predicted valuestargets- True target valuesreduction- How to reduce the loss: ‘mean’, ‘sum’, or ‘none’ (default: ‘mean’)
Formula
Properties
- Always non-negative
- Penalizes large errors heavily (quadratic)
- Differentiable everywhere
- Common for regression tasks
Use Cases
- Regression tasks
- Continuous value prediction
- Measuring distance between predictions and targets
Examples
Basic Usage
With Reduction Options
Training Loop
maeLoss
Mean Absolute Error (L1) loss function.Signature
predictions- Predicted valuestargets- True target valuesreduction- How to reduce the loss (default: ‘mean’)
Formula
Properties
- Always non-negative
- Linear penalty for errors
- Less sensitive to outliers than MSE
- More robust for noisy data
Use Cases
- Regression with outliers
- When outliers should have less influence
- Robust regression
Example
crossEntropyLoss
Cross Entropy Loss for multi-class classification.Signature
input- Predicted logits of shape(n_samples, n_classes)target- True labels, either:- Class indices of shape
(n_samples,)- integers from 0 to n_classes-1 - Probabilities/One-hot of shape
(n_samples, n_classes)
- Class indices of shape
- Scalar loss value (number for Tensor input)
- GradTensor for differentiable computation
Formula
Properties
- Used for multi-class classification
- Automatically applies log_softmax internally
- Supports both hard labels (class indices) and soft labels (probabilities)
- Numerically stable implementation
Use Cases
- Multi-class classification (mutually exclusive classes)
- Image classification
- Text classification
- Any classification with > 2 classes
Examples
With Class Indices
With One-Hot Encoded Labels
Classification Model
binaryCrossEntropyLoss
Binary Cross Entropy loss for binary classification with probability inputs.Signature
predictions- Predicted probabilities (0 to 1) after sigmoidtargets- True binary labels (0 or 1)reduction- How to reduce the loss (default: ‘mean’)
Formula
Properties
- Requires predictions in range (0, 1) - use sigmoid activation
- Targets should be 0 or 1
- Numerically stable with epsilon clamping
- For binary classification only
Use Cases
- Binary classification
- Multi-label classification (independent binary decisions)
Example
binaryCrossEntropyWithLogitsLoss
Binary Cross Entropy with logits. Combines sigmoid and BCE for numerical stability.Signature
input- Predicted logits (before sigmoid)target- True binary labels (0 or 1)
Formula
x is input and z is target. This is numerically stable.
Properties
- More numerically stable than sigmoid + BCE
- Input should be logits (not probabilities)
- Preferred over
binaryCrossEntropyLossfor training
Example
rmseLoss
Root Mean Squared Error loss function.Signature
predictions- Predicted valuestargets- True target values
Formula
Properties
- Square root of MSE
- Error in same units as target
- More interpretable than MSE
- Common metric for regression
Example
huberLoss
Huber loss - combines MSE and MAE for robust regression.Signature
predictions- Predicted valuestargets- True target valuesdelta- Threshold where loss transitions from quadratic to linear (default: 1.0)reduction- How to reduce the loss (default: ‘mean’)
Formula
Properties
- Quadratic for small errors (like MSE)
- Linear for large errors (like MAE)
- Robust to outliers
- Controlled by delta parameter
Use Cases
- Regression with outliers
- When you want MSE benefits but MAE robustness
- Robotics and control systems
Example
Choosing a Loss Function
For Regression:
-
MSE - Default choice, penalizes large errors heavily
-
MAE - When you have outliers
-
Huber - Best of both worlds
-
RMSE - When you want interpretable error in target units
For Classification:
-
Cross Entropy - Multi-class classification
-
Binary Cross Entropy - Binary classification (probabilities)
-
BCE With Logits - Binary classification (more stable)
Complete Training Example
Custom Loss Functions
You can create custom loss functions by combining operations:Loss Function Summary
| Loss | Task | Properties | Best For |
|---|---|---|---|
| MSE | Regression | Quadratic, penalizes outliers | Standard regression |
| MAE | Regression | Linear, robust | Noisy data |
| Huber | Regression | Hybrid MSE/MAE | Robust regression |
| RMSE | Regression | Interpretable units | Metrics, evaluation |
| Cross Entropy | Multi-class | Combines softmax + NLL | Classification |
| BCE | Binary | Requires probabilities | Binary classification |
| BCE With Logits | Binary | Numerically stable | Binary classification |
See Also
- Module - Building neural networks
- Optimizers - Training algorithms
- Activation Functions - Softmax for classification
- Linear Layer - Output layers