Understanding Daily Reports

Overview

Daily reports provide deep statistical analysis of the bot’s performance. They’re generated as Markdown files with YAML frontmatter, compatible with Obsidian and other knowledge management tools.

Generating Reports

pnpm report

Reports are saved to your configured Obsidian vault at:

{VAULT_ROOT}/{VAULT_DEST}/reporte-{date}.md

Report Structure

Each report contains 10 major sections:

Resumen - Basic statistics and price range
Scoring Metrics - Brier Score, Log Loss, Brier Skill Score
Murphy Decomposition - Calibration quality breakdown
Accuracy de predicciones - Final vs Early 1m performance
Bandas de confianza - Performance by confidence level (5 bands)
Runs Test - Statistical independence test
Rachas - Win/loss streaks
Abstenciones - Abstention analysis by reason
Datos de Mercado - Polymarket market data coverage
Gestion de Riesgo - Trade execution and PnL
Fallos del Early 1m - Detailed breakdown of misses
Observaciones - Key insights and recommendations

Section Breakdown

1. Resumen (Summary)

| Metrica | Valor |
|---------|-------|
| Intervalos analizados | 64 |
| Resultado UP | 26 (41%) |
| Resultado DOWN | 38 (59%) |
| Rango de precio | $64,119.86 - $65,234.12 |

Intervalos analizados

Total number of 5-minute intervals that closed during this day.Each interval represents one complete prediction cycle (5 minutes).

Resultado UP / DOWN

How many intervals closed above the strike (UP) vs below (DOWN).What it tells you:

Market direction bias for the day
If heavily skewed (70%+ one direction), model may have had easier/harder predictions
Balanced split (45-55%) is typical for binary markets

Rango de precio

Lowest and highest BTC close prices during the day.Use this to:

Assess volatility (wider range = more volatile)
Correlate with prediction accuracy (extreme volatility often reduces accuracy)

2. Scoring Metrics

| Metrica | Valor | Baseline | Interpretacion |
|---------|-------|----------|----------------|
| **Brier Score** | 0.1842 | 0.2500 | Mejor que random |
| **Log Loss** | 0.5234 | 0.6931 | Mejor que random |
| **BSS** | 0.2632 | 0 | Modelo tiene skill |

Brier Score

Formula: BS = (1/N) * Σ(p - outcome)²Measures calibration quality. Lower is better.Interpretation:

< 0.15 - Excellent calibration (elite performance)
0.15-0.20 - Good calibration (strong model)
0.20-0.25 - Acceptable (better than random)
> 0.25 - Poor (no better than coin flip)

Baseline: 0.25 (random guessing)Why it matters: Brier Score directly affects bet sizing. Lower Brier = higher Kelly alpha = larger bets.See /home/daytona/workspace/source/src/engine/metrics.js:6 for implementation.

Log Loss

Formula: LL = -(1/N) * Σ[outcome*ln(p) + (1-outcome)*ln(1-p)]Penalizes confident wrong predictions more heavily than Brier.Interpretation:

< 0.50 - Excellent
0.50-0.69 - Good (better than baseline)
> 0.69 - Poor (worse than random)

Baseline: 0.6931 (random 50/50 guessing)Why it matters: Log Loss is more sensitive to overconfident predictions. A model with good Brier but bad Log Loss is overconfident.

BSS (Brier Skill Score)

Formula: BSS = 1 - (BS / BS_baseline)Measures improvement over random baseline.Interpretation:

> 0.20 - Excellent skill
0.10-0.20 - Good skill
0-0.10 - Marginal skill
< 0 - Worse than random (model is broken)

Example: BSS of 0.26 means model is 26% better than random guessing.

3. Murphy Decomposition

| Componente | Valor | Nota |
|------------|-------|------|
| Reliability | 0.0234 | Menor = mejor calibrado |
| Resolution | 0.1842 | Mayor = mejor separacion |
| Uncertainty | 0.2500 | Fijo (propiedad del problema) |

Reliability

What it measures: How well predicted probabilities match actual outcomes.Formula: Reliability = Σ n_k * (p_k - o_k)²where p_k is mean forecast probability in bin k, o_k is mean outcome in bin k.Interpretation:

< 0.03 - Excellent calibration
0.03-0.05 - Good
> 0.05 - Poorly calibrated

What to do if high:

Enable Platt calibration in config
Check if model is overconfident
Review abstention thresholds

Resolution

What it measures: How well the model separates outcomes (discrimination power).Formula: Resolution = Σ n_k * (o_k - ō)²Higher is better - means model can distinguish between UP and DOWN outcomes.Interpretation:

> 0.15 - Excellent discrimination
0.10-0.15 - Good
< 0.10 - Weak discrimination

What to do if low:

Model may need more features (check momentum, volatility)
May be too conservative (check abstention rate)

Uncertainty

What it measures: Inherent unpredictability of the problem.Formula: Uncertainty = ō * (1 - ō)where ō is the base rate (proportion of UP outcomes).This is fixed based on the data - you can’t change it. It represents the difficulty of the prediction task.Relationship: Brier Score = Uncertainty - Resolution + Reliability

4. Accuracy Comparison

| Tipo | Accuracy | Aciertos | Total |
|------|----------|----------|-------|
| **Final (ultimo segundo)** | 88.0% | 44 | 50 |
| **Early 1m (la que importa)** | 86.5% | 45 | 52 |

Why two accuracy measures?

Final (30s) - Captured at 30 seconds before close
- Usually higher accuracy (more data)
- Too late to trade on (not enough time to execute)
- Useful for model validation
Early 1m (60s) - Captured at 60 seconds before close
- This is the real trading metric
- Practical signal you can act on
- Target: 80%+ for profitable trading

What if Early 1m < 80%?

If Early 1m accuracy drops below 80%:

Check data quality - Missing ticks? WebSocket issues?
Review abstention rate - Is model being aggressive?
Analyze confidence bands - Is high-confidence band still strong?
Check market conditions - High volatility day?
Run Murphy decomposition - Calibration or discrimination issue?

5. Confidence Bands (5 bands)

| Banda | Rango | Count | Accuracy | Brier | Mean Prob |
|-------|-------|-------|----------|-------|----------|
| 1 | 50-60% | 12 | 58% | 0.2456 | 0.55 |
| 2 | 60-70% | 15 | 67% | 0.2234 | 0.65 |
| 3 | 70-80% | 18 | 78% | 0.1723 | 0.75 |
| 4 | 80-90% | 10 | 90% | 0.1234 | 0.85 |
| 5 | 90-100% | 7 | 100% | 0.0456 | 0.95 |

How bands work

Each prediction is placed into one of 5 confidence bands based on its probability:

Band 1: 50-60% (low confidence)
Band 2: 60-70% (moderate)
Band 3: 70-80% (good)
Band 4: 80-90% (high confidence)
Band 5: 90-100% (very high)

For each band, we calculate:

Count - Number of predictions in this band
Accuracy - What % were correct
Brier - Calibration quality for this band
Mean Prob - Average probability

What to look for

Ideal pattern:

Higher bands have higher accuracy
Band 4-5 accuracy should be 85%+
Brier Score decreases as band increases

Red flags:

Band 5 accuracy < 85% (overconfident)
Band 1 accuracy > 65% (underconfident, should be more aggressive)
Bands 4-5 have very few predictions (too conservative)

Trading insight: Only trade bands 4-5 (75%+ confidence) for best risk/reward.

6. Runs Test (Independence)

| Metrica | Valor |
|---------|-------|
| Runs observados | 28 |
| Runs esperados | 31.2 |
| Z-score | -1.42 |
| P-value | 0.155 |

No se detecta dependencia serial (p >= 0.05). Errores aparentemente aleatorios.

What is a runs test?

Tests whether prediction errors are randomly distributed or show patterns.Run = sequence of consecutive successes or failuresExample:

OK OK MISS MISS OK OK OK MISS
^run1^ ^run2^ ^run3^^ ^run4^
= 4 runs

Too few runs = clustering (model has systematic biases) Too many runs = alternating pattern (model overcorrects)

Interpreting p-value

p < 0.05 - Dependencia serial detectada
- Errors are not random
- Model has systematic biases (e.g., always misses trending markets)
- Action: Review momentum/reversion features
p >= 0.05 - Errores aparentemente aleatorios
- Good! Errors appear random
- Model isn’t missing obvious patterns
- Continue monitoring

7. Streaks (Rachas)

| Metrica | Valor |
|---------|-------|
| Mejor racha de aciertos (early) | 8 seguidos |
| Peor racha de fallos (early) | 4 seguidos |

Win Streaks

Longest consecutive correct Early 1m predictions.What it tells you:

Model’s best performance period
Confidence during hot streaks
Potential for compound gains

Loss Streaks

Longest consecutive incorrect Early 1m predictions.Critical for risk management:

If max loss streak = 4, you need bankroll to survive 4 consecutive losses
Cold streak abstention triggers at 40% accuracy over rolling window
Drawdown tracking prevents catastrophic loss

Example: With 4-bet max loss streak and

5 average bet, you need

20+ cushion.

8. Abstenciones

| Metrica | Valor |
|---------|-------|
| Total intervalos | 64 |
| Abstenciones | 12 (18.8%) |
| Accuracy sin abstenciones | 92.3% |

| Razon | Cantidad |
|-------|----------|
| insufficient_margin | 8 |
| dead_zone | 3 |
| insufficient_ev | 1 |

Abstention Analysis

Key metrics:

Abstention Rate - What % of intervals we didn’t trade
Accuracy sin abstenciones - Accuracy on intervals we DID predict

Ideal range: 10-25% abstention rateInterpretation:

< 10% - Model may be too aggressive, taking marginal bets
10-25% - Healthy selectivity
> 25% - Model may be too conservative, missing opportunities

Abstention Reasons

Most common reasons:

insufficient_margin - Edge < 15pp (most common)
dead_zone - Probability too close to 50%
insufficient_ev - EV < 5%
drawdown_suspension - In RED/CRITICAL drawdown
cold_streak - Accuracy dropped below 40%
insufficient_data - < 50 ticks collected
anomalous_volatility - Volatility > 2x mean

Action items:

If insufficient_margin dominates, consider lowering threshold (but increases risk)
If cold_streak appears often, model may need recalibration
If anomalous_volatility is high, check WebSocket feed quality

9. Market Data (Polymarket)

| Metrica | Valor |
|---------|-------|
| Intervalos con q_market | 58/64 (91%) |
| q_market promedio | 0.52 |
| EV promedio (positivo) | +8.3% |
| Tasa de edge positivo | 64.7% |

Market Coverage

Intervalos con q_market - How many intervals had Polymarket data available.Target: 90%+Low coverage (<80%) indicates:

Polymarket API connectivity issues
Market wasn’t active during those intervals
Bot started before market opened

q_market promedio

Average Polymarket UP token price across all intervals.Interpretation:

~0.50 - Market is balanced, no directional bias
> 0.60 - Market is bullish (expects UP outcomes)
< 0.40 - Market is bearish (expects DOWN outcomes)

EV promedio (positivo)

Average Expected Value on intervals where we found positive EV.Target: +5% minimumHigher is better:

+10%+ - Excellent edge finding
+5-10% - Good edges
< +5% - Marginal edges, may not cover fees

Tasa de edge positivo

Percentage of intervals where model found positive EV vs market.Example: 64.7% = we found +EV on 65% of available marketsInterpretation:

> 60% - Model is good at finding mispriced markets
40-60% - Moderate edge detection
< 40% - Model may not be adding value over market

10. Risk Management

| Metrica | Valor |
|---------|-------|
| Trades ejecutados | 14 |
| Win rate (trades) | 85.7% |
| Bet size promedio | $3.24 |
| PnL estimado | +$8.50 |

Trades ejecutados

Number of intervals where bet size > 0.Lower than total intervals due to abstentions.

Win rate (trades)

Accuracy on intervals where we actually placed bets.Important distinction:

Overall accuracy includes all predictions
Trade win rate only counts predictions we bet on

Target: 80%+ (should be higher than overall due to selectivity)

Bet size promedio

Average bet size using fractional Kelly criterion.Formula: Bet = alpha * Kelly * bankroll * (1 - drawdown_factor)Typical range:

1-10 on

100 bankrollFactors affecting size:

Model accuracy (Brier Score tier)
Edge size (higher edge = bigger bet)
Drawdown level (deeper drawdown = smaller bets)
Bankroll size

PnL estimado

Estimated profit/loss assuming:

Win = +1 unit
Loss = -1 unit
Unit = bet size

Simplified calculation: PnL = Σ(wins * bet_size) - Σ(losses * bet_size)Note: This is a simulation. Real P&L depends on:

Actual token prices at execution
Slippage and fees
Order execution timing

11. Fallos del Early 1m

| # | Strike | Final | Result | Early dijo | Accuracy |
|---|--------|-------|--------|------------|----------|
| 15 | $64,523.45 | $64,489.12 | DOWN | ^ UP 72% | MISS |
| 28 | $64,789.23 | $64,823.56 | UP | v DN 68% | MISS |
| 42 | $64,654.78 | $64,623.45 | DOWN | ^ UP 85% | MISS |

Miss Analysis

Each row shows an Early 1m prediction that was incorrect.What to look for:

High confidence misses (80%+) - Most concerning, indicates miscalibration
Patterns - Do misses cluster in trending vs ranging markets?
Strike proximity - Are misses when price is very close to strike?

Example insight: “All 3 misses were high-confidence (>68%) predictions that failed. Check if model is overconfident during volatile periods.”

12. Observaciones

Auto-generated insights based on the day’s data:

- Early 1m accuracy: **86.5%** sobre 52 predicciones
- Brier Score: **0.1842** (mejor que random 0.25)
- Banda 5 (90-100%): 100% accuracy, 7 predicciones
- Muestra solida (64 intervalos). Los numeros son confiables.

Using Reports for Optimization

Daily Review

Check Early 1m accuracy and Brier Score. If both are strong (>80% accuracy, BS <0.20), continue current config.

Weekly Trends

Compare 7 days of reports. Look for:

Declining accuracy trends
Increasing abstention rates
Changing market coverage

Optimization Signals

Increase aggression if:

Trade win rate > 85%
Abstention rate > 30%
High confidence bands (4-5) have excellent accuracy

Decrease aggression if:

Trade win rate < 75%
Brier Score > 0.22
High confidence misses increasing

Configuration Adjustments

Based on reports, tune:

Abstention thresholds (margin, EV)
Volatility regime detection
Kelly alpha tiers

Report File Format

YAML Frontmatter

---
title: "Reporte Diario Growthly Market Bot - 2026-02-23"
date: "2026-02-23"
updated: "2026-02-23"
project: "growthly-market-bot"
type: "technical-report"
status: "active"
version: "1.0"
tags:
  - polymarket
  - bitcoin
  - prediction
  - daily-report
changelog:
  - version: "1.0"
    date: "2026-02-23"
    changes:
      - "Reporte diario generado automaticamente"
related:
  - "[[polyparse-oportunidades-negocio]]"
  - "[[plan-prediction-engine]]"
---

This frontmatter makes reports searchable and linkable in Obsidian.

Storage Location

Reports are saved to:

/Users/rperaza/Library/Mobile Documents/iCloud~md~obsidian/Documents/
joicodev/ideas/polymarket/reports/reporte-{date}.md

Configure in /home/daytona/workspace/source/src/reporter/daily.js:9-10

Next Steps

Reading Output

Learn to interpret the real-time console display

Troubleshooting

Fix common issues and improve performance

Get Started

Core Concepts

Configuration

Guides

Understanding Daily Reports

Overview

Generating Reports

Report Structure

Section Breakdown

1. Resumen (Summary)

2. Scoring Metrics

3. Murphy Decomposition

4. Accuracy Comparison

5. Confidence Bands (5 bands)

6. Runs Test (Independence)

7. Streaks (Rachas)

8. Abstenciones

9. Market Data (Polymarket)

10. Risk Management

11. Fallos del Early 1m

12. Observaciones

Using Reports for Optimization

Report File Format

YAML Frontmatter

Storage Location

Next Steps

Reading Output

Troubleshooting

Build docs developers (and LLMs) love

Get Started

Core Concepts

Configuration

Guides

​Overview

​Generating Reports

​Report Structure

​Section Breakdown

​1. Resumen (Summary)

​2. Scoring Metrics

​3. Murphy Decomposition

​4. Accuracy Comparison

​5. Confidence Bands (5 bands)

​6. Runs Test (Independence)

​7. Streaks (Rachas)

​8. Abstenciones

​9. Market Data (Polymarket)

​10. Risk Management

​11. Fallos del Early 1m

​12. Observaciones

​Using Reports for Optimization

​Report File Format

​YAML Frontmatter

​Storage Location

​Next Steps

Reading Output

Troubleshooting

Build docs developers (and LLMs) love

Overview

Generating Reports

Report Structure

Section Breakdown

1. Resumen (Summary)

2. Scoring Metrics

3. Murphy Decomposition

4. Accuracy Comparison

5. Confidence Bands (5 bands)

6. Runs Test (Independence)

7. Streaks (Rachas)

8. Abstenciones

9. Market Data (Polymarket)

10. Risk Management

11. Fallos del Early 1m

12. Observaciones

Using Reports for Optimization

Report File Format

YAML Frontmatter

Storage Location

Next Steps