Overview
Platt scaling is a post-hoc calibration method that maps raw model probabilities to calibrated probabilities via a learned sigmoid transformation: Where A and B are fitted by minimizing log loss over collected prediction/outcome pairs.Why Calibration?
Well-Calibrated vs. Accurate
A model can be accurate (predicts the correct outcome often) but poorly calibrated (probabilities don’t match empirical frequencies). Example: A model predicts 0.70 probability for UP on 100 intervals. If only 55 finish UP, the model is overconfident (miscalibrated).Reliability Diagram
Calibration is visualized via a reliability diagram:Platt scaling shifts and stretches the reliability curve to align with the diagonal.
Mathematical Formulation
Sigmoid Transform
The calibration mapping is: where: Parameters:- A: Slope (scaling factor). A < 1 → compress confidence. A > 1 → stretch confidence.
- B: Intercept (bias correction). B > 0 → shift probabilities upward. B < 0 → shift downward.
Parameter Fitting
A and B are fitted by minimizing log loss over the calibration dataset: Where:- y_i ∈ = actual outcome (1 = prediction correct, 0 = incorrect)
- p_cal,i = σ(A · logit(p_raw,i) + B)
Gradient Descent: The implementation uses vanilla gradient descent with learning rate 0.01 and 1000 iterations. This is sufficient for the smooth, convex log loss surface.
Implementation
Constructor
calibration.js
Data Collection
calibration.js
Fitting Algorithm
calibration.js
Calibration Function
calibration.js
Helper Functions
calibration.js
Activation Logic
The scaler auto-activates when ≥200 samples have been collected:predictor.js
Minimum Sample Size: 200 samples ensures stable parameter estimates. Below this threshold, the scaler returns raw probabilities unchanged.
Usage Example
Interpretation of Parameters
A (Slope)
| A Value | Meaning |
|---|---|
| A = 1 | No scaling (identity) |
| A < 1 | Overconfident model — compress probabilities toward 0.5 |
| A > 1 | Underconfident model — stretch probabilities away from 0.5 |
| A ≈ 0 | Model has no discriminative power (all predictions → 0.5) |
B (Intercept)
| B Value | Meaning |
|---|---|
| B = 0 | No bias |
| B > 0 | Pessimistic model — shift probabilities upward (predicts UP more often) |
| B < 0 | Optimistic model — shift probabilities downward (predicts DOWN more often) |
Typical Values: A well-calibrated model has A ≈ 1, B ≈ 0. Systematic deviations indicate model deficiencies (e.g., A = 0.6 suggests overconfidence).
Integration with Engine
Data Collection
The engine feeds outcome data to the scaler after each trade:predictor.js
Prediction Pipeline
Calibration is the final step before returning a prediction:predictor.js
Advantages
- Simple: Only 2 parameters (A, B)
- Fast: O(1) calibration call, O(N) fitting (N = sample size)
- Monotonic: Preserves ranking (if p₁ > p₂, then p_cal₁ > p_cal₂)
- Bounded: Output always ∈ (0, 1)
Limitations
1. Assumes Sigmoid Shape
Platt scaling assumes the calibration curve is sigmoid-shaped. If the true reliability curve is non-monotonic or highly irregular, Platt scaling will fail. Alternative: Isotonic regression (non-parametric, but requires more data).2. Requires Representative Data
The calibration dataset must be representative of future predictions. If market conditions shift (e.g., volatility regime change), the scaler needs re-fitting. Mitigation: Rolling window (2000 samples) + automatic re-fit on buffer overflow.3. Small Sample Instability
With <200 samples, parameter estimates are noisy. The implementation sets_minSamples = 200 to prevent premature activation.
Diagnostics
Stats Query
calibration.js
Logging
The engine logs calibration events:- A = 0.82 → Model is overconfident (compress probabilities)
- B = -0.16 → Model is optimistic (shift probabilities downward)
Advanced Topics
Isotonic Regression
For non-parametric calibration, use isotonic regression: This fits a piecewise-constant function that preserves order. Requires more data (500+ samples) but handles arbitrary reliability curves.Beta Calibration
An extension of Platt scaling with 3 parameters: The C term captures non-linear distortions. Useful for models with extreme probability bias.Temperature Scaling
A simpler variant used in neural networks: where T is a learned temperature (T > 1 → soften, T < 1 → sharpen). This is equivalent to Platt scaling with A = 1/T, B = 0.Performance Characteristics
- Calibration: O(1) per call
- Fitting: O(N × I), where N = sample size, I = iterations (default: 1000)
- Typical Fit Time: ~10ms for 200 samples, ~50ms for 2000 samples
- Memory: O(N) for data buffer (capped at 2000)
References
- Platt, J. C. (1999). “Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods.” Advances in Large Margin Classifiers, 61-74.
- Niculescu-Mizil, A., & Caruana, R. (2005). “Predicting Good Probabilities with Supervised Learning.” ICML, 625-632.
- Guo, C., et al. (2017). “On Calibration of Modern Neural Networks.” ICML, 1321-1330.
Next Steps
Prediction Engine
Complete integration of all components including calibration
Metrics
Evaluate calibration quality using Brier score and reliability diagrams