Skip to main content

Overview

Daily reports provide deep statistical analysis of the bot’s performance. They’re generated as Markdown files with YAML frontmatter, compatible with Obsidian and other knowledge management tools.

Generating Reports

pnpm report
Reports are saved to your configured Obsidian vault at:
{VAULT_ROOT}/{VAULT_DEST}/reporte-{date}.md

Report Structure

Each report contains 10 major sections:
  1. Resumen - Basic statistics and price range
  2. Scoring Metrics - Brier Score, Log Loss, Brier Skill Score
  3. Murphy Decomposition - Calibration quality breakdown
  4. Accuracy de predicciones - Final vs Early 1m performance
  5. Bandas de confianza - Performance by confidence level (5 bands)
  6. Runs Test - Statistical independence test
  7. Rachas - Win/loss streaks
  8. Abstenciones - Abstention analysis by reason
  9. Datos de Mercado - Polymarket market data coverage
  10. Gestion de Riesgo - Trade execution and PnL
  11. Fallos del Early 1m - Detailed breakdown of misses
  12. Observaciones - Key insights and recommendations

Section Breakdown

1. Resumen (Summary)

| Metrica | Valor |
|---------|-------|
| Intervalos analizados | 64 |
| Resultado UP | 26 (41%) |
| Resultado DOWN | 38 (59%) |
| Rango de precio | $64,119.86 - $65,234.12 |
Total number of 5-minute intervals that closed during this day.Each interval represents one complete prediction cycle (5 minutes).
How many intervals closed above the strike (UP) vs below (DOWN).What it tells you:
  • Market direction bias for the day
  • If heavily skewed (70%+ one direction), model may have had easier/harder predictions
  • Balanced split (45-55%) is typical for binary markets
Lowest and highest BTC close prices during the day.Use this to:
  • Assess volatility (wider range = more volatile)
  • Correlate with prediction accuracy (extreme volatility often reduces accuracy)

2. Scoring Metrics

| Metrica | Valor | Baseline | Interpretacion |
|---------|-------|----------|----------------|
| **Brier Score** | 0.1842 | 0.2500 | Mejor que random |
| **Log Loss** | 0.5234 | 0.6931 | Mejor que random |
| **BSS** | 0.2632 | 0 | Modelo tiene skill |
Formula: BS = (1/N) * Σ(p - outcome)²Measures calibration quality. Lower is better.Interpretation:
  • < 0.15 - Excellent calibration (elite performance)
  • 0.15-0.20 - Good calibration (strong model)
  • 0.20-0.25 - Acceptable (better than random)
  • > 0.25 - Poor (no better than coin flip)
Baseline: 0.25 (random guessing)Why it matters: Brier Score directly affects bet sizing. Lower Brier = higher Kelly alpha = larger bets.See /home/daytona/workspace/source/src/engine/metrics.js:6 for implementation.
Formula: LL = -(1/N) * Σ[outcome*ln(p) + (1-outcome)*ln(1-p)]Penalizes confident wrong predictions more heavily than Brier.Interpretation:
  • < 0.50 - Excellent
  • 0.50-0.69 - Good (better than baseline)
  • > 0.69 - Poor (worse than random)
Baseline: 0.6931 (random 50/50 guessing)Why it matters: Log Loss is more sensitive to overconfident predictions. A model with good Brier but bad Log Loss is overconfident.
Formula: BSS = 1 - (BS / BS_baseline)Measures improvement over random baseline.Interpretation:
  • > 0.20 - Excellent skill
  • 0.10-0.20 - Good skill
  • 0-0.10 - Marginal skill
  • < 0 - Worse than random (model is broken)
Example: BSS of 0.26 means model is 26% better than random guessing.

3. Murphy Decomposition

| Componente | Valor | Nota |
|------------|-------|------|
| Reliability | 0.0234 | Menor = mejor calibrado |
| Resolution | 0.1842 | Mayor = mejor separacion |
| Uncertainty | 0.2500 | Fijo (propiedad del problema) |
What it measures: How well predicted probabilities match actual outcomes.Formula: Reliability = Σ n_k * (p_k - o_k)²where p_k is mean forecast probability in bin k, o_k is mean outcome in bin k.Interpretation:
  • < 0.03 - Excellent calibration
  • 0.03-0.05 - Good
  • > 0.05 - Poorly calibrated
What to do if high:
  • Enable Platt calibration in config
  • Check if model is overconfident
  • Review abstention thresholds
What it measures: How well the model separates outcomes (discrimination power).Formula: Resolution = Σ n_k * (o_k - ō)²Higher is better - means model can distinguish between UP and DOWN outcomes.Interpretation:
  • > 0.15 - Excellent discrimination
  • 0.10-0.15 - Good
  • < 0.10 - Weak discrimination
What to do if low:
  • Model may need more features (check momentum, volatility)
  • May be too conservative (check abstention rate)
What it measures: Inherent unpredictability of the problem.Formula: Uncertainty = ō * (1 - ō)where ō is the base rate (proportion of UP outcomes).This is fixed based on the data - you can’t change it. It represents the difficulty of the prediction task.Relationship: Brier Score = Uncertainty - Resolution + Reliability

4. Accuracy Comparison

| Tipo | Accuracy | Aciertos | Total |
|------|----------|----------|-------|
| **Final (ultimo segundo)** | 88.0% | 44 | 50 |
| **Early 1m (la que importa)** | 86.5% | 45 | 52 |
  • Final (30s) - Captured at 30 seconds before close
    • Usually higher accuracy (more data)
    • Too late to trade on (not enough time to execute)
    • Useful for model validation
  • Early 1m (60s) - Captured at 60 seconds before close
    • This is the real trading metric
    • Practical signal you can act on
    • Target: 80%+ for profitable trading
If Early 1m accuracy drops below 80%:
  1. Check data quality - Missing ticks? WebSocket issues?
  2. Review abstention rate - Is model being aggressive?
  3. Analyze confidence bands - Is high-confidence band still strong?
  4. Check market conditions - High volatility day?
  5. Run Murphy decomposition - Calibration or discrimination issue?

5. Confidence Bands (5 bands)

| Banda | Rango | Count | Accuracy | Brier | Mean Prob |
|-------|-------|-------|----------|-------|----------|
| 1 | 50-60% | 12 | 58% | 0.2456 | 0.55 |
| 2 | 60-70% | 15 | 67% | 0.2234 | 0.65 |
| 3 | 70-80% | 18 | 78% | 0.1723 | 0.75 |
| 4 | 80-90% | 10 | 90% | 0.1234 | 0.85 |
| 5 | 90-100% | 7 | 100% | 0.0456 | 0.95 |
Each prediction is placed into one of 5 confidence bands based on its probability:
  • Band 1: 50-60% (low confidence)
  • Band 2: 60-70% (moderate)
  • Band 3: 70-80% (good)
  • Band 4: 80-90% (high confidence)
  • Band 5: 90-100% (very high)
For each band, we calculate:
  • Count - Number of predictions in this band
  • Accuracy - What % were correct
  • Brier - Calibration quality for this band
  • Mean Prob - Average probability
Ideal pattern:
  • Higher bands have higher accuracy
  • Band 4-5 accuracy should be 85%+
  • Brier Score decreases as band increases
Red flags:
  • Band 5 accuracy < 85% (overconfident)
  • Band 1 accuracy > 65% (underconfident, should be more aggressive)
  • Bands 4-5 have very few predictions (too conservative)
Trading insight: Only trade bands 4-5 (75%+ confidence) for best risk/reward.

6. Runs Test (Independence)

| Metrica | Valor |
|---------|-------|
| Runs observados | 28 |
| Runs esperados | 31.2 |
| Z-score | -1.42 |
| P-value | 0.155 |

No se detecta dependencia serial (p >= 0.05). Errores aparentemente aleatorios.
Tests whether prediction errors are randomly distributed or show patterns.Run = sequence of consecutive successes or failuresExample:
OK OK MISS MISS OK OK OK MISS
^run1^ ^run2^ ^run3^^ ^run4^
= 4 runs
Too few runs = clustering (model has systematic biases) Too many runs = alternating pattern (model overcorrects)
  • p < 0.05 - Dependencia serial detectada
    • Errors are not random
    • Model has systematic biases (e.g., always misses trending markets)
    • Action: Review momentum/reversion features
  • p >= 0.05 - Errores aparentemente aleatorios
    • Good! Errors appear random
    • Model isn’t missing obvious patterns
    • Continue monitoring

7. Streaks (Rachas)

| Metrica | Valor |
|---------|-------|
| Mejor racha de aciertos (early) | 8 seguidos |
| Peor racha de fallos (early) | 4 seguidos |
Longest consecutive correct Early 1m predictions.What it tells you:
  • Model’s best performance period
  • Confidence during hot streaks
  • Potential for compound gains
Longest consecutive incorrect Early 1m predictions.Critical for risk management:
  • If max loss streak = 4, you need bankroll to survive 4 consecutive losses
  • Cold streak abstention triggers at 40% accuracy over rolling window
  • Drawdown tracking prevents catastrophic loss
Example: With 4-bet max loss streak and 5averagebet,youneed5 average bet, you need 20+ cushion.

8. Abstenciones

| Metrica | Valor |
|---------|-------|
| Total intervalos | 64 |
| Abstenciones | 12 (18.8%) |
| Accuracy sin abstenciones | 92.3% |

| Razon | Cantidad |
|-------|----------|
| insufficient_margin | 8 |
| dead_zone | 3 |
| insufficient_ev | 1 |
Key metrics:
  • Abstention Rate - What % of intervals we didn’t trade
  • Accuracy sin abstenciones - Accuracy on intervals we DID predict
Ideal range: 10-25% abstention rateInterpretation:
  • < 10% - Model may be too aggressive, taking marginal bets
  • 10-25% - Healthy selectivity
  • > 25% - Model may be too conservative, missing opportunities
Most common reasons:
  1. insufficient_margin - Edge < 15pp (most common)
  2. dead_zone - Probability too close to 50%
  3. insufficient_ev - EV < 5%
  4. drawdown_suspension - In RED/CRITICAL drawdown
  5. cold_streak - Accuracy dropped below 40%
  6. insufficient_data - < 50 ticks collected
  7. anomalous_volatility - Volatility > 2x mean
Action items:
  • If insufficient_margin dominates, consider lowering threshold (but increases risk)
  • If cold_streak appears often, model may need recalibration
  • If anomalous_volatility is high, check WebSocket feed quality

9. Market Data (Polymarket)

| Metrica | Valor |
|---------|-------|
| Intervalos con q_market | 58/64 (91%) |
| q_market promedio | 0.52 |
| EV promedio (positivo) | +8.3% |
| Tasa de edge positivo | 64.7% |
Intervalos con q_market - How many intervals had Polymarket data available.Target: 90%+Low coverage (<80%) indicates:
  • Polymarket API connectivity issues
  • Market wasn’t active during those intervals
  • Bot started before market opened
Average Polymarket UP token price across all intervals.Interpretation:
  • ~0.50 - Market is balanced, no directional bias
  • > 0.60 - Market is bullish (expects UP outcomes)
  • < 0.40 - Market is bearish (expects DOWN outcomes)
Average Expected Value on intervals where we found positive EV.Target: +5% minimumHigher is better:
  • +10%+ - Excellent edge finding
  • +5-10% - Good edges
  • < +5% - Marginal edges, may not cover fees
Percentage of intervals where model found positive EV vs market.Example: 64.7% = we found +EV on 65% of available marketsInterpretation:
  • > 60% - Model is good at finding mispriced markets
  • 40-60% - Moderate edge detection
  • < 40% - Model may not be adding value over market

10. Risk Management

| Metrica | Valor |
|---------|-------|
| Trades ejecutados | 14 |
| Win rate (trades) | 85.7% |
| Bet size promedio | $3.24 |
| PnL estimado | +$8.50 |
Number of intervals where bet size > 0.Lower than total intervals due to abstentions.
Accuracy on intervals where we actually placed bets.Important distinction:
  • Overall accuracy includes all predictions
  • Trade win rate only counts predictions we bet on
Target: 80%+ (should be higher than overall due to selectivity)
Average bet size using fractional Kelly criterion.Formula: Bet = alpha * Kelly * bankroll * (1 - drawdown_factor)Typical range: 110on1-10 on 100 bankrollFactors affecting size:
  • Model accuracy (Brier Score tier)
  • Edge size (higher edge = bigger bet)
  • Drawdown level (deeper drawdown = smaller bets)
  • Bankroll size
Estimated profit/loss assuming:
  • Win = +1 unit
  • Loss = -1 unit
  • Unit = bet size
Simplified calculation: PnL = Σ(wins * bet_size) - Σ(losses * bet_size)Note: This is a simulation. Real P&L depends on:
  • Actual token prices at execution
  • Slippage and fees
  • Order execution timing

11. Fallos del Early 1m

| # | Strike | Final | Result | Early dijo | Accuracy |
|---|--------|-------|--------|------------|----------|
| 15 | $64,523.45 | $64,489.12 | DOWN | ^ UP 72% | MISS |
| 28 | $64,789.23 | $64,823.56 | UP | v DN 68% | MISS |
| 42 | $64,654.78 | $64,623.45 | DOWN | ^ UP 85% | MISS |
Each row shows an Early 1m prediction that was incorrect.What to look for:
  • High confidence misses (80%+) - Most concerning, indicates miscalibration
  • Patterns - Do misses cluster in trending vs ranging markets?
  • Strike proximity - Are misses when price is very close to strike?
Example insight: “All 3 misses were high-confidence (>68%) predictions that failed. Check if model is overconfident during volatile periods.”

12. Observaciones

Auto-generated insights based on the day’s data:
- Early 1m accuracy: **86.5%** sobre 52 predicciones
- Brier Score: **0.1842** (mejor que random 0.25)
- Banda 5 (90-100%): 100% accuracy, 7 predicciones
- Muestra solida (64 intervalos). Los numeros son confiables.

Using Reports for Optimization

1

Daily Review

Check Early 1m accuracy and Brier Score. If both are strong (>80% accuracy, BS <0.20), continue current config.
2

Weekly Trends

Compare 7 days of reports. Look for:
  • Declining accuracy trends
  • Increasing abstention rates
  • Changing market coverage
3

Optimization Signals

Increase aggression if:
  • Trade win rate > 85%
  • Abstention rate > 30%
  • High confidence bands (4-5) have excellent accuracy
Decrease aggression if:
  • Trade win rate < 75%
  • Brier Score > 0.22
  • High confidence misses increasing
4

Configuration Adjustments

Based on reports, tune:
  • Abstention thresholds (margin, EV)
  • Volatility regime detection
  • Kelly alpha tiers

Report File Format

YAML Frontmatter

---
title: "Reporte Diario Growthly Market Bot - 2026-02-23"
date: "2026-02-23"
updated: "2026-02-23"
project: "growthly-market-bot"
type: "technical-report"
status: "active"
version: "1.0"
tags:
  - polymarket
  - bitcoin
  - prediction
  - daily-report
changelog:
  - version: "1.0"
    date: "2026-02-23"
    changes:
      - "Reporte diario generado automaticamente"
related:
  - "[[polyparse-oportunidades-negocio]]"
  - "[[plan-prediction-engine]]"
---
This frontmatter makes reports searchable and linkable in Obsidian.

Storage Location

Reports are saved to:
/Users/rperaza/Library/Mobile Documents/iCloud~md~obsidian/Documents/
joicodev/ideas/polymarket/reports/reporte-{date}.md
Configure in /home/daytona/workspace/source/src/reporter/daily.js:9-10

Next Steps

Reading Output

Learn to interpret the real-time console display

Troubleshooting

Fix common issues and improve performance

Build docs developers (and LLMs) love