Overview
SNAG-Bench is the Quality Certifier of the Timepoint Suite—an open-source validation framework that measures Causal Resolution across Flash and Pro renderings. Where Flash renders history and Pro simulates futures, SNAG-Bench answers: “How good is this rendering?”SNAG-Bench is currently in development. This documentation describes its planned architecture and role in the suite.
What is Causal Resolution?
Causal Resolution = Coverage × Convergence The fundamental quality metric for temporal renderings:Coverage
How much of a scenario has been rendered?- Entity coverage: What % of relevant entities have states?
- Temporal coverage: What % of timepoints are rendered?
- Relationship coverage: What % of entity pairs have relationship data?
- Causal coverage: What % of expected causal edges exist?
Convergence
How reliably do repeated runs converge on the same causal structure?- Structural convergence: Jaccard similarity of causal graphs
- Entity convergence: Consistency of entity states across runs
- Dialog convergence: Semantic similarity of generated conversations
- Outcome convergence: Agreement on final states
Why It Matters
High Causal Resolution means:- Simulations are comprehensive (coverage)
- Simulations are reliable (convergence)
- Training data is high-quality
- Predictions are trustworthy
- Missing critical entities or relationships (low coverage)
- Unstable simulation dynamics (low convergence)
- Insufficient grounding context
- Need for more rendering passes
The Validation Framework
SNAG-Bench operates in two axes:Axis 1: Structural Validation
Evaluate a single rendering’s internal consistency.- All entities have states at all timepoints
- All knowledge has provenance (M3)
- All causal edges have sources
- No temporal paradoxes (future knowledge in past)
- Relationship consistency over time
Axis 2: Convergence Validation
Compare multiple renderings of the same scenario.- Jaccard similarity of causal edges
- Cosine similarity of entity state vectors
- Semantic similarity of dialog (via embeddings)
- Outcome alignment (final states match)
Causal Resolution Score
Integration with the Suite
Flash → SNAG-Bench
Validate historical renderings: Questions SNAG-Bench answers:- Is the historical record complete enough for this rendering?
- Do multiple renderings of the same event converge?
- What’s the confidence level for this Rendered Past?
Pro → SNAG-Bench
Validate simulations and Rendered Futures: Questions SNAG-Bench answers:- Is this simulation internally consistent?
- Do repeated runs converge on the same causal structure?
- Is this rendering high-quality enough for training data?
- Should we create prediction markets for this scenario?
Benchmarking Causal Reasoning
SNAG-Bench also serves as a benchmark for causal reasoning models.Challenge Datasets
SNAG-Bench will include challenge datasets:| Dataset | Source | Difficulty | Entities | Timepoints | Causal Edges |
|---|---|---|---|---|---|
| Historical Pivots | Flash renderings | Hard | 5-10 | 10-20 | 30-60 |
| Corporate Crises | Pro simulations | Medium | 4-8 | 8-16 | 20-40 |
| Multi-Agent Strategy | Pro PORTAL mode | Hard | 6-12 | 12-24 | 40-80 |
| Counterfactual Branches | Pro BRANCHING mode | Expert | 8-16 | 16-32 | 60-120 |
Evaluation Tasks
- Causal Path Prediction: Given nodes A and C, predict intermediate node B
- Outcome Forecasting: Given initial states, predict final states
- Knowledge Provenance: Given entity knowledge, identify source and timing
- Temporal Consistency: Detect anachronisms and causality violations
- Counterfactual Reasoning: Given a branch point, predict alternate outcomes
Leaderboard
Models will be ranked on:- Causal accuracy: % of causal edges correctly predicted
- Outcome accuracy: % of final states correctly forecasted
- Provenance accuracy: % of knowledge sources correctly identified
- Consistency score: % of runs without temporal violations
- Composite score: Weighted average across all tasks
Quality Gates for Training Data
SNAG-Bench enables quality filtering:- No low-quality data polluting fine-tuning
- Quantitative quality metrics for dataset documentation
- Confidence scores for each training example
- Convergence tracking for reliability estimation
The Asymptotic Fidelity Curve
The fidelity is asymptotic—we approach near-simulacrum on historical dialog because there are very few things a person could have said once the model has perfect context for that moment. SNAG-Bench measures where we are on this curve:- Steep part of curve (0-60% coverage): Each new rendering adds significant value
- Plateau (60-90%): Diminishing returns, but still improving
- Asymptote (90%+): Near-simulacrum quality, but never perfect
Proof of Causal Convergence (PoCC)
SNAG-Bench is critical to PoCC: Multiple independent renderings that converge (measured by SNAG-Bench) provide validation without ground truth.Timepoint Futures Index (TFI)
SNAG-Bench contributes to TFI calculation:Implementation Status
SNAG-Bench is in active development. Planned features:
- Axis 1 validators for structural quality
- Axis 2 validators for convergence measurement
- Challenge datasets from Flash and Pro renderings
- Leaderboard for causal reasoning models
- TDF integration for automatic quality tagging
- Clockchain integration for confidence updates
Use Cases
Quality Assurance
Quality Assurance
Run SNAG-Bench on all Flash and Pro outputs to ensure high-quality renderings before adding to Clockchain.
Training Data Filtering
Training Data Filtering
Only use renderings with Causal Resolution > 0.8 for model fine-tuning, ensuring clean, reliable training data.
Convergence Research
Convergence Research
Study how many renderings are needed for convergence across different scenario types and complexity levels.
Model Benchmarking
Model Benchmarking
Evaluate new causal reasoning models on SNAG-Bench challenge datasets and track progress on the leaderboard.
Repository
SNAG-Bench will be open-source, available atgithub.com/timepoint-ai/timepoint-snag-bench.
Next Steps
Proteus Settlement
See how Proteus validates predictions against reality
Clockchain Storage
Learn how quality scores update Clockchain confidence
Training Data
Explore how SNAG-Bench filters Pro training data
Suite Overview
Return to the full Timepoint Suite overview

