Skip to main content
The Polymarket Bot’s prediction quality and risk-adjusted returns are highly sensitive to a handful of critical parameters. This guide walks through the tuning process, explains parameter interactions, and provides practical recommendations based on market behavior.

Tuning Philosophy

Golden rule: Tune on backtests, validate on live data, never optimize on the same data twice.
The bot uses a multi-model system (Black-Scholes + momentum + reversion) combined in logit space. Each component has tunable parameters, and they interact in non-obvious ways:
  1. EWMA lambda controls volatility estimation smoothness
  2. Logit weights control how momentum and reversion adjust base probability
  3. Abstention thresholds control when the model has enough edge to trade
  4. Risk parameters control position sizing and drawdown protection

EWMA Lambda (Volatility Estimation)

What It Does

The EWMA lambda parameter (engine.ewma.lambda) controls the decay rate for exponentially weighted volatility estimation. Higher lambda = more weight on recent observations = smoother volatility that reacts slowly to shocks. Formula: variance = lambda * variance + (1 - lambda) * (r^2 / dt)

Default Value

"engine": {
  "ewma": {
    "lambda": 0.94
  }
}

Tuning Range

LambdaBehaviorWhen to Use
0.90-0.92Fast adaptation, noisyHigh-frequency volatility regime (flash crashes, major news)
0.93-0.94Balanced (default)Normal market conditions
0.95-0.96Slow adaptation, smoothLow-frequency trends, stable markets

Tuning Process

  1. Collect tick data over at least 100 intervals (8+ hours)
  2. Compute realized volatility per interval
  3. Run backtests with lambda in [0.90, 0.92, 0.94, 0.96]
  4. Evaluate:
    • Brier Score (calibration quality)
    • Sharpe Ratio (risk-adjusted returns)
    • Volatility outlier frequency (how often sigma exceeds sigmaMultiplier * meanSigma)

Example

# Backtest with lambda=0.92 (faster adaptation)
node src/backtest.js --lambda 0.92 --data data/ticks-2026-03-04.jsonl

# Compare Brier Score and Sharpe
node src/reporter/daily.js 2026-03-04
Start with 0.94. If you see frequent abstentions due to anomalous_regime during normal market conditions, increase lambda to 0.96. If the model misses rapid volatility shifts, decrease to 0.92.

Logit Weights (Momentum & Reversion)

What They Do

The logit weights control how momentum ROC and mean reversion signals adjust the Black-Scholes base probability in log-odds space. Formula: finalProb = sigmoid(logit(baseProb) + w_momentum * ROC + w_reversion * deviation) See src/engine/predictor.js:128-132.

Default Values

"engine": {
  "prediction": {
    "logitMomentumWeight": 150,
    "logitReversionWeight": 80
  }
}

Parameter Interaction

These weights are not independent. The ratio between them determines the model’s behavior:
RatioBehaviorRisk
w_momentum / w_reversion > 2.5Trend-following (chases momentum)Misses reversals, gets whipsawed
w_momentum / w_reversion ≈ 2.0Balanced (default: 150/80 = 1.875)Good for mixed markets
w_momentum / w_reversion < 1.5Mean-reversion (fades moves)Misses breakouts, fights trends

Tuning Range

ParameterRangeDefault
logitMomentumWeight50 - 300150
logitReversionWeight40 - 15080

Tuning Process

  1. Fix one weight, sweep the other
    # Fix reversion at 80, sweep momentum from 100 to 200 in steps of 20
    for w in 100 120 140 160 180 200; do
      node src/backtest.js --momentum-weight $w --reversion-weight 80
    done
    
  2. Evaluate on held-out intervals:
    • Brier Score (lower is better)
    • Log Loss (lower is better)
    • Early 1m accuracy (higher is better, target >80%)
    • EV per trade (higher is better)
  3. Check calibration:
    node src/reporter/daily.js 2026-03-04 | grep "Murphy"
    
    Look for low resolution loss (MRES) and low calibration loss (MCAL).
{
  "logitMomentumWeight": 180,
  "logitReversionWeight": 70
}
Do not tune on recent live data. You will overfit. Use data from at least 7 days ago, tune on 70% of intervals, validate on 30%.

Abstention Thresholds

The abstention system prevents trading when the model has no edge. There are 6 configurable conditions:

Critical Thresholds

engine.abstention.deadZone
number
default:"0.10"
Most impactful parameter. Controls the minimum edge before making a prediction.
  • Decrease (0.05-0.08) → Trade more often, accept smaller edges, higher risk of false signals
  • Increase (0.12-0.15) → Trade less often, require stronger edges, miss opportunities
Tuning goal: Maximize (EV per trade) * (trade frequency)
engine.abstention.minEV
number
default:"0.05"
Minimum expected value to place a bet. Works in conjunction with minMargin.Formula: EV = (p / q) - 1
  • Decrease (0.03) → Accept smaller edges, more trades
  • Increase (0.08-0.10) → Only trade high-EV opportunities
engine.abstention.minMargin
number
default:"0.15"
Minimum edge in percentage points (|p - q| >= minMargin).
  • Decrease (0.10-0.12) → Trade on smaller edges
  • Increase (0.18-0.20) → Require wider edges, fewer trades

Tuning Process

  1. Start with defaults (deadZone=0.10, minEV=0.05, minMargin=0.15)
  2. Collect 200+ intervals of live data
  3. Analyze abstention reasons:
    grep '"abstained":true' data/history.json | jq .reason | sort | uniq -c
    
  4. Compute realized performance by condition:
    • If many insufficient_margin abstentions and high Brier Score → increase minMargin
    • If few trades and model is well-calibrated → decrease deadZone or minMargin
    • If many trades with negative EV → increase minEV

Example Analysis

# Count abstention reasons
$ grep '"abstained":true' data/history.json | jq -r .reason | sort | uniq -c
  12 anomalous_regime
  45 dead_zone
   3 insufficient_data
  18 insufficient_margin
   5 cold_streak

# Interpretation:
# - 45 dead_zone: Base probability near 50%, no directional edge
# - 18 insufficient_margin: Model has edge but not enough to overcome noise
# - 12 anomalous_regime: High volatility periods

# If accuracy on traded intervals is >85%, consider reducing deadZone to 0.08
# If accuracy is <75%, increase minMargin to 0.18
Conservative tuning: Start with deadZone=0.12, minEV=0.06, minMargin=0.18. This will trade less often but with higher conviction. Lower thresholds as your model proves calibration quality.

Risk Parameters

Brier Tiers (Kelly Fraction)

The bot uses dynamic Kelly fractions based on calibration quality:
"risk": {
  "brierTiers": [
    { "maxBrier": null, "minPredictions": 0, "maxPredictions": 100, "alpha": 0 },
    { "maxBrier": 1.0, "minPredictions": 100, "alpha": 0.10 },
    { "maxBrier": 0.26, "minPredictions": 100, "alpha": 0.20 },
    { "maxBrier": 0.22, "minPredictions": 100, "alpha": 0.25 },
    { "maxBrier": 0.18, "minPredictions": 100, "alpha": 0.40 }
  ]
}
Interpretation:
  • Tier 0: No trading until 100 predictions collected
  • Tier 1: Poor calibration (Brier > 0.26) → 10% Kelly (very conservative)
  • Tier 2: Decent calibration (0.22-0.26) → 20% Kelly
  • Tier 3: Good calibration (0.18-0.22) → 25% Kelly
  • Tier 4: Excellent calibration (< 0.18) → 40% Kelly

Tuning Alpha Values

Do not increase alpha beyond 0.50 (half-Kelly). Full Kelly is aggressive and can lead to large drawdowns. Even 50% Kelly assumes your edge estimate is perfectly accurate.
If your model achieves Brier < 0.18 consistently:
  • Conservative: Keep alpha at 0.40 (default tier 4)
  • Moderate: Increase to 0.50
  • Aggressive: Consider 0.60, but monitor drawdown closely

Max Bet Percentage

"risk": {
  "maxBetPct": 0.05
}
Caps individual bets at 5% of bankroll regardless of Kelly calculation. Tuning:
  • Conservative: 0.03 (3% max bet)
  • Default: 0.05 (5% max bet)
  • Aggressive: 0.08 (8% max bet, not recommended)

Drawdown Thresholds

"risk": {
  "drawdown": {
    "yellowPct": 0.10,
    "redPct": 0.20,
    "criticalPct": 0.30
  }
}
Effects:
  • Yellow (10%): Warning only, no bet sizing impact
  • Red (20%): Alpha multiplier = 0.5 (half-size bets)
  • Critical (30%): Trading suspended
Tuning:
  • If you hit red frequently: Lower alpha tiers or increase abstention thresholds
  • If you never hit yellow: Your bet sizing may be too conservative (missing edge)

Practical Tuning Workflow

Phase 1: Collect Baseline Data (Days 1-7)

Run the bot with default parameters for at least 7 days.
{
  "engine": {
    "ewma": { "lambda": 0.94 },
    "prediction": {
      "logitMomentumWeight": 150,
      "logitReversionWeight": 80
    },
    "abstention": {
      "deadZone": 0.10,
      "minEV": 0.05,
      "minMargin": 0.15
    }
  }
}
Goal: Establish baseline Brier Score, accuracy, trade frequency, and EV.

Phase 2: Analyze Performance (Day 8)

Generate daily reports and identify weaknesses:
for day in 2026-03-01 2026-03-02 2026-03-03 2026-03-04 2026-03-05 2026-03-06 2026-03-07; do
  node src/reporter/daily.js $day
done
Key metrics:
  1. Brier Score: Should be < 0.22 to justify trading
  2. Early 1m accuracy: Target > 80%
  3. Trade frequency: 40-60% of intervals (not too selective, not too loose)
  4. EV per trade: Target > 0.05 (5%)

Phase 3: Tune One Dimension (Days 8-14)

Pick one parameter group to tune:
Problem: Poor calibrationTune: Logit weights
  1. Try reducing logitMomentumWeight to 120 (less aggressive)
  2. Try reducing logitReversionWeight to 60 (less mean reversion)
  3. Monitor Brier Score daily

Phase 4: Validate (Days 15-21)

Run with tuned parameters for 7 more days. Do not look at results daily (avoid overfitting). After 7 days, compare:
  • Brier Score (should improve by 0.02-0.05)
  • Sharpe Ratio (should improve)
  • Drawdown (should be shallower)
If metrics degrade, revert to baseline.

Advanced: Multi-Parameter Optimization

Once you have 30+ days of data, consider grid search:
# Grid search over momentum/reversion weights
for momentum in 120 140 160 180; do
  for reversion in 60 80 100; do
    node src/backtest.js \
      --momentum-weight $momentum \
      --reversion-weight $reversion \
      --data data/history-2026-03.json \
      --output results-${momentum}-${reversion}.json
  done
done

# Rank by Sharpe Ratio
jq -s 'sort_by(.sharpe) | reverse | .[0:5]' results-*.json
Overfitting risk is real. Always validate on out-of-sample data. A configuration that performs 2% better on training data but 5% worse on validation data is not an improvement.

Quick Reference

Conservative (High Accuracy, Low Frequency)

{
  "engine": {
    "ewma": { "lambda": 0.96 },
    "prediction": {
      "logitMomentumWeight": 120,
      "logitReversionWeight": 80
    },
    "abstention": {
      "deadZone": 0.12,
      "minEV": 0.07,
      "minMargin": 0.18
    }
  },
  "risk": {
    "maxBetPct": 0.03,
    "brierTiers": [
      { "maxBrier": null, "minPredictions": 0, "maxPredictions": 150, "alpha": 0 },
      { "maxBrier": 0.22, "minPredictions": 150, "alpha": 0.15 },
      { "maxBrier": 0.18, "minPredictions": 150, "alpha": 0.25 }
    ]
  }
}

Aggressive (Higher Frequency, Moderate Accuracy)

{
  "engine": {
    "ewma": { "lambda": 0.92 },
    "prediction": {
      "logitMomentumWeight": 180,
      "logitReversionWeight": 70
    },
    "abstention": {
      "deadZone": 0.08,
      "minEV": 0.04,
      "minMargin": 0.12
    }
  },
  "risk": {
    "maxBetPct": 0.06,
    "brierTiers": [
      { "maxBrier": null, "minPredictions": 0, "maxPredictions": 100, "alpha": 0 },
      { "maxBrier": 0.26, "minPredictions": 100, "alpha": 0.15 },
      { "maxBrier": 0.22, "minPredictions": 100, "alpha": 0.30 },
      { "maxBrier": 0.18, "minPredictions": 100, "alpha": 0.50 }
    ]
  }
}

Next Steps

Config Reference

Full reference for all configuration parameters.

Environment Setup

Set up data directories and logging for production.

Build docs developers (and LLMs) love