Skip to main content

Overview

Platt scaling is a post-hoc calibration method that maps raw model probabilities to calibrated probabilities via a learned sigmoid transformation: pcal=σ(Alogit(praw)+B)p_{\text{cal}} = \sigma\left(A \cdot \text{logit}(p_{\text{raw}}) + B\right) Where A and B are fitted by minimizing log loss over collected prediction/outcome pairs.

Why Calibration?

Well-Calibrated vs. Accurate

A model can be accurate (predicts the correct outcome often) but poorly calibrated (probabilities don’t match empirical frequencies). Example: A model predicts 0.70 probability for UP on 100 intervals. If only 55 finish UP, the model is overconfident (miscalibrated).

Reliability Diagram

Calibration is visualized via a reliability diagram:
Predicted Probability (x-axis) vs. Observed Frequency (y-axis)

1.0 ┤                              ●
    │                           ●
0.8 ┤                        ●
    │                     ●
0.6 ┤                  ●
    │               ●
0.4 ┤            ●
    │         ●
0.2 ┤      ●
    │   ●
0.0 ┤●─────────────────────────────
    0.0  0.2  0.4  0.6  0.8  1.0

Perfect calibration = diagonal line (predicted = observed)
Platt scaling shifts and stretches the reliability curve to align with the diagonal.

Mathematical Formulation

Sigmoid Transform

The calibration mapping is: pcal=11+e(Az+B)p_{\text{cal}} = \frac{1}{1 + e^{-(A \cdot z + B)}} where: z=logit(praw)=log(praw1praw)z = \text{logit}(p_{\text{raw}}) = \log\left(\frac{p_{\text{raw}}}{1 - p_{\text{raw}}}\right) Parameters:
  • A: Slope (scaling factor). A < 1 → compress confidence. A > 1 → stretch confidence.
  • B: Intercept (bias correction). B > 0 → shift probabilities upward. B < 0 → shift downward.

Parameter Fitting

A and B are fitted by minimizing log loss over the calibration dataset: L(A,B)=1Ni=1N[yilog(pcal,i)+(1yi)log(1pcal,i)]\mathcal{L}(A, B) = -\frac{1}{N} \sum_{i=1}^N \left[ y_i \log(p_{\text{cal},i}) + (1 - y_i) \log(1 - p_{\text{cal},i}) \right] Where:
  • y_i = actual outcome (1 = prediction correct, 0 = incorrect)
  • p_cal,i = σ(A · logit(p_raw,i) + B)
Gradient Descent: The implementation uses vanilla gradient descent with learning rate 0.01 and 1000 iterations. This is sufficient for the smooth, convex log loss surface.

Implementation

Constructor

calibration.js
export class PlattScaler {
  constructor() {
    this._data = []       // Array of { predicted, outcome }
    this._fitted = false
    this._A = 1           // Identity mapping before fitting
    this._B = 0
    this._minSamples = 200
  }
}

Data Collection

calibration.js
collect(predictedProb, actualOutcome) {
  this._data.push({ predicted: predictedProb, outcome: actualOutcome })
  if (this._data.length > 2000) {
    this._data = this._data.slice(-2000)  // Keep last 2000 samples
    this._fitted = false  // Re-fit required
  }
}
Rolling Window: The scaler retains only the last 2000 samples to prevent unbounded memory growth and to adapt to non-stationary market conditions. When the buffer fills, _fitted is reset, triggering a re-fit on next prediction.

Fitting Algorithm

calibration.js
fit() {
  if (!this.canFit()) return

  let A = 1
  let B = 0
  const lr = 0.01
  const iterations = 1000
  const n = this._data.length

  for (let iter = 0; iter < iterations; iter++) {
    let gradA = 0
    let gradB = 0

    for (const { predicted, outcome } of this._data) {
      const z = A * logit(predicted) + B
      const pCal = sigmoid(z)
      // Gradient of log loss: d/dA = (pCal - outcome) * logit(predicted) / n
      const err = pCal - outcome
      gradA += err * logit(predicted)
      gradB += err
    }

    gradA /= n
    gradB /= n

    A -= lr * gradA
    B -= lr * gradB
  }

  this._A = A
  this._B = B
  this._fitted = true
  logger.info('PlattScaler fitted', { A: A.toFixed(4), B: B.toFixed(4), samples: n })
}

Calibration Function

calibration.js
calibrate(p) {
  if (!this._fitted) return p
  return sigmoid(this._A * logit(p) + this._B)
}

Helper Functions

calibration.js
function logit(p) {
  const safe = clamp(p, 1e-7, 1 - 1e-7)
  return Math.log(safe / (1 - safe))
}

function sigmoid(z) {
  return 1 / (1 + Math.exp(-z))
}

function clamp(value, min, max) {
  return Math.min(max, Math.max(min, value))
}

Activation Logic

The scaler auto-activates when ≥200 samples have been collected:
predictor.js
if (this._scaler.canFit()) {
  if (!this._scaler.getStats().fitted) {
    this._scaler.fit()
  }
  finalProb = this._scaler.calibrate(finalProb)
  finalProb = clamp(finalProb, 0.01, 0.99)  // Safety clamp
  calibrated = true
}
Minimum Sample Size: 200 samples ensures stable parameter estimates. Below this threshold, the scaler returns raw probabilities unchanged.

Usage Example

import { PlattScaler } from './calibration.js'

const scaler = new PlattScaler()

// Collect prediction/outcome pairs during live trading
for (const trade of historicalTrades) {
  const predicted = trade.probability
  const outcome = trade.correct ? 1 : 0
  scaler.collect(predicted, outcome)
}

// Once enough data is collected, fit the scaler
if (scaler.canFit()) {
  scaler.fit()
  const { A, B, sampleSize } = scaler.getStats()
  console.log(`Fitted Platt scaler: A=${A}, B=${B} (n=${sampleSize})`)
}

// Calibrate new predictions
const rawProb = 0.73
const calProb = scaler.calibrate(rawProb)
console.log(`Raw: ${rawProb}, Calibrated: ${calProb}`)

Interpretation of Parameters

A (Slope)

A ValueMeaning
A = 1No scaling (identity)
A < 1Overconfident model — compress probabilities toward 0.5
A > 1Underconfident model — stretch probabilities away from 0.5
A ≈ 0Model has no discriminative power (all predictions → 0.5)
Example: If A = 0.7, a raw probability of 0.80 is compressed: pcal=σ(0.7logit(0.80)+B)<0.80p_{\text{cal}} = \sigma(0.7 \cdot \text{logit}(0.80) + B) < 0.80

B (Intercept)

B ValueMeaning
B = 0No bias
B > 0Pessimistic model — shift probabilities upward (predicts UP more often)
B < 0Optimistic model — shift probabilities downward (predicts DOWN more often)
Example: If B = 0.5, all probabilities are biased upward: pcal=σ(Az+0.5)>σ(Az)p_{\text{cal}} = \sigma(A \cdot z + 0.5) > \sigma(A \cdot z)
Typical Values: A well-calibrated model has A ≈ 1, B ≈ 0. Systematic deviations indicate model deficiencies (e.g., A = 0.6 suggests overconfidence).

Integration with Engine

Data Collection

The engine feeds outcome data to the scaler after each trade:
predictor.js
recordOutcome(correct, predictedProb) {
  // Cold-streak tracking
  this._recentOutcomes.push(correct)
  // ...

  // Feed Platt scaler
  const prob = predictedProb ?? this._lastPrediction
  if (prob != null) {
    this._scaler.collect(prob, correct ? 1 : 0)
  }
}

Prediction Pipeline

Calibration is the final step before returning a prediction:
predictor.js
// Step 1: Black-Scholes base probability
const baseProb = binaryCallProbability({...})

// Step 2: Logit-space momentum/reversion adjustments
const logitAdj = logit(baseProb) + ...
let finalProb = sigmoid(logitAdj)

// Step 3: Platt calibration (if active)
if (this._scaler.canFit()) {
  if (!this._scaler.getStats().fitted) {
    this._scaler.fit()
  }
  finalProb = this._scaler.calibrate(finalProb)
}

finalProb = clamp(finalProb, 0.01, 0.99)

Advantages

  1. Simple: Only 2 parameters (A, B)
  2. Fast: O(1) calibration call, O(N) fitting (N = sample size)
  3. Monotonic: Preserves ranking (if p₁ > p₂, then p_cal₁ > p_cal₂)
  4. Bounded: Output always ∈ (0, 1)

Limitations

1. Assumes Sigmoid Shape

Platt scaling assumes the calibration curve is sigmoid-shaped. If the true reliability curve is non-monotonic or highly irregular, Platt scaling will fail. Alternative: Isotonic regression (non-parametric, but requires more data).

2. Requires Representative Data

The calibration dataset must be representative of future predictions. If market conditions shift (e.g., volatility regime change), the scaler needs re-fitting. Mitigation: Rolling window (2000 samples) + automatic re-fit on buffer overflow.

3. Small Sample Instability

With <200 samples, parameter estimates are noisy. The implementation sets _minSamples = 200 to prevent premature activation.
Bootstrap Recommendation: For critical applications, estimate confidence intervals for A and B via bootstrap resampling. If intervals are wide, delay calibration activation.

Diagnostics

Stats Query

calibration.js
getStats() {
  return {
    sampleSize: this._data.length,
    fitted: this._fitted,
    A: this._A,
    B: this._B,
  }
}

Logging

The engine logs calibration events:
[INFO] PlattScaler fitted { A: 0.8234, B: -0.1567, samples: 412 }
Interpretation:
  • A = 0.82 → Model is overconfident (compress probabilities)
  • B = -0.16 → Model is optimistic (shift probabilities downward)

Advanced Topics

Isotonic Regression

For non-parametric calibration, use isotonic regression: pcal=argminfi(f(pi)yi)2subject to f is monotonicp_{\text{cal}} = \text{argmin}_f \sum_i (f(p_i) - y_i)^2 \quad \text{subject to } f \text{ is monotonic} This fits a piecewise-constant function that preserves order. Requires more data (500+ samples) but handles arbitrary reliability curves.

Beta Calibration

An extension of Platt scaling with 3 parameters: pcal=σ(Alogit(p)+B+Clog(p(1p)))p_{\text{cal}} = \sigma(A \cdot \text{logit}(p) + B + C \cdot \log(p(1-p))) The C term captures non-linear distortions. Useful for models with extreme probability bias.

Temperature Scaling

A simpler variant used in neural networks: pcal=σ(z/T)p_{\text{cal}} = \sigma(z / T) where T is a learned temperature (T > 1 → soften, T < 1 → sharpen). This is equivalent to Platt scaling with A = 1/T, B = 0.

Performance Characteristics

  • Calibration: O(1) per call
  • Fitting: O(N × I), where N = sample size, I = iterations (default: 1000)
  • Typical Fit Time: ~10ms for 200 samples, ~50ms for 2000 samples
  • Memory: O(N) for data buffer (capped at 2000)

References

  • Platt, J. C. (1999). “Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods.” Advances in Large Margin Classifiers, 61-74.
  • Niculescu-Mizil, A., & Caruana, R. (2005). “Predicting Good Probabilities with Supervised Learning.” ICML, 625-632.
  • Guo, C., et al. (2017). “On Calibration of Modern Neural Networks.” ICML, 1321-1330.

Next Steps

Prediction Engine

Complete integration of all components including calibration

Metrics

Evaluate calibration quality using Brier score and reliability diagrams

Build docs developers (and LLMs) love