Platt Calibration - Polymarket Bot

Overview

Platt scaling is a post-hoc calibration method that maps raw model probabilities to calibrated probabilities via a learned sigmoid transformation:

p_{\text{cal}} = \sigma\left(A \cdot \text{logit}(p_{\text{raw}}) + B\right)

Where A and B are fitted by minimizing log loss over collected prediction/outcome pairs.

Why Calibration?

Well-Calibrated vs. Accurate

A model can be accurate (predicts the correct outcome often) but poorly calibrated (probabilities don’t match empirical frequencies). Example: A model predicts 0.70 probability for UP on 100 intervals. If only 55 finish UP, the model is overconfident (miscalibrated).

Reliability Diagram

Calibration is visualized via a reliability diagram:

Predicted Probability (x-axis) vs. Observed Frequency (y-axis)

1.0 ┤                              ●
    │                           ●
0.8 ┤                        ●
    │                     ●
0.6 ┤                  ●
    │               ●
0.4 ┤            ●
    │         ●
0.2 ┤      ●
    │   ●
0.0 ┤●─────────────────────────────
    0.0  0.2  0.4  0.6  0.8  1.0

Perfect calibration = diagonal line (predicted = observed)

Platt scaling shifts and stretches the reliability curve to align with the diagonal.

Mathematical Formulation

Sigmoid Transform

The calibration mapping is:

p_{\text{cal}} = \frac{1}{1 + e^{-(A \cdot z + B)}}

where:

z = \text{logit}(p_{\text{raw}}) = \log\left(\frac{p_{\text{raw}}}{1 - p_{\text{raw}}}\right)

Parameters:

A: Slope (scaling factor). A < 1 → compress confidence. A > 1 → stretch confidence.
B: Intercept (bias correction). B > 0 → shift probabilities upward. B < 0 → shift downward.

Parameter Fitting

A and B are fitted by minimizing log loss over the calibration dataset:

\mathcal{L}(A, B) = -\frac{1}{N} \sum_{i=1}^N \left[ y_i \log(p_{\text{cal},i}) + (1 - y_i) \log(1 - p_{\text{cal},i}) \right]

Where:

y_i ∈ = actual outcome (1 = prediction correct, 0 = incorrect)
p_cal,i = σ(A · logit(p_raw,i) + B)

Gradient Descent: The implementation uses vanilla gradient descent with learning rate 0.01 and 1000 iterations. This is sufficient for the smooth, convex log loss surface.

Implementation

Constructor

calibration.js

export class PlattScaler {
  constructor() {
    this._data = []       // Array of { predicted, outcome }
    this._fitted = false
    this._A = 1           // Identity mapping before fitting
    this._B = 0
    this._minSamples = 200
  }
}

Data Collection

calibration.js

collect(predictedProb, actualOutcome) {
  this._data.push({ predicted: predictedProb, outcome: actualOutcome })
  if (this._data.length > 2000) {
    this._data = this._data.slice(-2000)  // Keep last 2000 samples
    this._fitted = false  // Re-fit required
  }
}

Rolling Window: The scaler retains only the last 2000 samples to prevent unbounded memory growth and to adapt to non-stationary market conditions. When the buffer fills, _fitted is reset, triggering a re-fit on next prediction.

Fitting Algorithm

calibration.js

fit() {
  if (!this.canFit()) return

  let A = 1
  let B = 0
  const lr = 0.01
  const iterations = 1000
  const n = this._data.length

  for (let iter = 0; iter < iterations; iter++) {
    let gradA = 0
    let gradB = 0

    for (const { predicted, outcome } of this._data) {
      const z = A * logit(predicted) + B
      const pCal = sigmoid(z)
      // Gradient of log loss: d/dA = (pCal - outcome) * logit(predicted) / n
      const err = pCal - outcome
      gradA += err * logit(predicted)
      gradB += err
    }

    gradA /= n
    gradB /= n

    A -= lr * gradA
    B -= lr * gradB
  }

  this._A = A
  this._B = B
  this._fitted = true
  logger.info('PlattScaler fitted', { A: A.toFixed(4), B: B.toFixed(4), samples: n })
}

Calibration Function

calibration.js

calibrate(p) {
  if (!this._fitted) return p
  return sigmoid(this._A * logit(p) + this._B)
}

Helper Functions

calibration.js

function logit(p) {
  const safe = clamp(p, 1e-7, 1 - 1e-7)
  return Math.log(safe / (1 - safe))
}

function sigmoid(z) {
  return 1 / (1 + Math.exp(-z))
}

function clamp(value, min, max) {
  return Math.min(max, Math.max(min, value))
}

Activation Logic

The scaler auto-activates when ≥200 samples have been collected:

predictor.js

if (this._scaler.canFit()) {
  if (!this._scaler.getStats().fitted) {
    this._scaler.fit()
  }
  finalProb = this._scaler.calibrate(finalProb)
  finalProb = clamp(finalProb, 0.01, 0.99)  // Safety clamp
  calibrated = true
}

Minimum Sample Size: 200 samples ensures stable parameter estimates. Below this threshold, the scaler returns raw probabilities unchanged.

Usage Example

import { PlattScaler } from './calibration.js'

const scaler = new PlattScaler()

// Collect prediction/outcome pairs during live trading
for (const trade of historicalTrades) {
  const predicted = trade.probability
  const outcome = trade.correct ? 1 : 0
  scaler.collect(predicted, outcome)
}

// Once enough data is collected, fit the scaler
if (scaler.canFit()) {
  scaler.fit()
  const { A, B, sampleSize } = scaler.getStats()
  console.log(`Fitted Platt scaler: A=${A}, B=${B} (n=${sampleSize})`)
}

// Calibrate new predictions
const rawProb = 0.73
const calProb = scaler.calibrate(rawProb)
console.log(`Raw: ${rawProb}, Calibrated: ${calProb}`)

Interpretation of Parameters

A (Slope)

A Value	Meaning
A = 1	No scaling (identity)
A < 1	Overconfident model — compress probabilities toward 0.5
A > 1	Underconfident model — stretch probabilities away from 0.5
A ≈ 0	Model has no discriminative power (all predictions → 0.5)

Example: If A = 0.7, a raw probability of 0.80 is compressed:

p_{\text{cal}} = \sigma(0.7 \cdot \text{logit}(0.80) + B) < 0.80

B (Intercept)

B Value	Meaning
B = 0	No bias
B > 0	Pessimistic model — shift probabilities upward (predicts UP more often)
B < 0	Optimistic model — shift probabilities downward (predicts DOWN more often)

Example: If B = 0.5, all probabilities are biased upward:

p_{\text{cal}} = \sigma(A \cdot z + 0.5) > \sigma(A \cdot z)

Typical Values: A well-calibrated model has A ≈ 1, B ≈ 0. Systematic deviations indicate model deficiencies (e.g., A = 0.6 suggests overconfidence).

Integration with Engine

Data Collection

The engine feeds outcome data to the scaler after each trade:

predictor.js

recordOutcome(correct, predictedProb) {
  // Cold-streak tracking
  this._recentOutcomes.push(correct)
  // ...

  // Feed Platt scaler
  const prob = predictedProb ?? this._lastPrediction
  if (prob != null) {
    this._scaler.collect(prob, correct ? 1 : 0)
  }
}

Prediction Pipeline

Calibration is the final step before returning a prediction:

predictor.js

// Step 1: Black-Scholes base probability
const baseProb = binaryCallProbability({...})

// Step 2: Logit-space momentum/reversion adjustments
const logitAdj = logit(baseProb) + ...
let finalProb = sigmoid(logitAdj)

// Step 3: Platt calibration (if active)
if (this._scaler.canFit()) {
  if (!this._scaler.getStats().fitted) {
    this._scaler.fit()
  }
  finalProb = this._scaler.calibrate(finalProb)
}

finalProb = clamp(finalProb, 0.01, 0.99)

Advantages

Simple: Only 2 parameters (A, B)
Fast: O(1) calibration call, O(N) fitting (N = sample size)
Monotonic: Preserves ranking (if p₁ > p₂, then p_cal₁ > p_cal₂)
Bounded: Output always ∈ (0, 1)

Limitations

1. Assumes Sigmoid Shape

Platt scaling assumes the calibration curve is sigmoid-shaped. If the true reliability curve is non-monotonic or highly irregular, Platt scaling will fail. Alternative: Isotonic regression (non-parametric, but requires more data).

2. Requires Representative Data

The calibration dataset must be representative of future predictions. If market conditions shift (e.g., volatility regime change), the scaler needs re-fitting. Mitigation: Rolling window (2000 samples) + automatic re-fit on buffer overflow.

3. Small Sample Instability

With <200 samples, parameter estimates are noisy. The implementation sets _minSamples = 200 to prevent premature activation.

Bootstrap Recommendation: For critical applications, estimate confidence intervals for A and B via bootstrap resampling. If intervals are wide, delay calibration activation.

Diagnostics

Stats Query

calibration.js

getStats() {
  return {
    sampleSize: this._data.length,
    fitted: this._fitted,
    A: this._A,
    B: this._B,
  }
}

Logging

The engine logs calibration events:

[INFO] PlattScaler fitted { A: 0.8234, B: -0.1567, samples: 412 }

Interpretation:

A = 0.82 → Model is overconfident (compress probabilities)
B = -0.16 → Model is optimistic (shift probabilities downward)

Advanced Topics

Isotonic Regression

For non-parametric calibration, use isotonic regression:

p_{\text{cal}} = \text{argmin}_f \sum_i (f(p_i) - y_i)^2 \quad \text{subject to } f \text{ is monotonic}

This fits a piecewise-constant function that preserves order. Requires more data (500+ samples) but handles arbitrary reliability curves.

Beta Calibration

An extension of Platt scaling with 3 parameters:

p_{\text{cal}} = \sigma(A \cdot \text{logit}(p) + B + C \cdot \log(p(1-p)))

The C term captures non-linear distortions. Useful for models with extreme probability bias.

Temperature Scaling

A simpler variant used in neural networks:

p_{\text{cal}} = \sigma(z / T)

where T is a learned temperature (T > 1 → soften, T < 1 → sharpen). This is equivalent to Platt scaling with A = 1/T, B = 0.

Performance Characteristics

Calibration: O(1) per call
Fitting: O(N × I), where N = sample size, I = iterations (default: 1000)
Typical Fit Time: ~10ms for 200 samples, ~50ms for 2000 samples
Memory: O(N) for data buffer (capped at 2000)

References

Platt, J. C. (1999). “Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods.” Advances in Large Margin Classifiers, 61-74.
Niculescu-Mizil, A., & Caruana, R. (2005). “Predicting Good Probabilities with Supervised Learning.” ICML, 625-632.
Guo, C., et al. (2017). “On Calibration of Modern Neural Networks.” ICML, 1321-1330.

Prediction Engine

Data & Tracking

Risk Management

API Reference

​Overview

​Why Calibration?

​Well-Calibrated vs. Accurate

​Reliability Diagram

​Mathematical Formulation

​Sigmoid Transform

​Parameter Fitting

​Implementation

​Constructor

​Data Collection

​Fitting Algorithm

​Calibration Function

​Helper Functions

​Activation Logic

​Usage Example

​Interpretation of Parameters

​A (Slope)

​B (Intercept)

​Integration with Engine

​Data Collection

​Prediction Pipeline

​Advantages

​Limitations

​1. Assumes Sigmoid Shape

​2. Requires Representative Data

​3. Small Sample Instability

​Diagnostics

​Stats Query

​Logging

​Advanced Topics

​Isotonic Regression

​Beta Calibration

​Temperature Scaling

​Performance Characteristics

​References

​Next Steps

Prediction Engine

Metrics

Build docs developers (and LLMs) love

Overview

Why Calibration?

Well-Calibrated vs. Accurate

Reliability Diagram

Mathematical Formulation

Sigmoid Transform

Parameter Fitting

Implementation

Constructor

Data Collection

Fitting Algorithm

Calibration Function

Helper Functions

Activation Logic

Usage Example

Interpretation of Parameters

A (Slope)

B (Intercept)

Integration with Engine

Data Collection

Prediction Pipeline

Advantages

Limitations

1. Assumes Sigmoid Shape

2. Requires Representative Data

3. Small Sample Instability

Diagnostics

Stats Query

Logging

Advanced Topics

Isotonic Regression

Beta Calibration

Temperature Scaling

Performance Characteristics

References

Next Steps