Overview
Timepoint Pro generates high-quality training data for fine-tuning language models. Unlike naive prompt/completion pairs, SNAG-generated data includes:- Full causal ancestry - Every knowledge item has provenance
- Quantitative state tensors - Emotional valence, arousal, energy at each turn
- Temporal consistency - Portal mode strips anachronistic knowledge
- Counterfactual reasoning - Branching mode shows “what if” alternatives
- Rich context - M3 knowledge provenance, M6 entity state, M7 causal history, M10 atmosphere, M11 dialog context, M13 relationships
- Causal reasoning models
- Multi-agent roleplay models
- Temporal reasoning systems
- Social simulation models
Export Formats
TDF (Timepoint Data Format)
TDF is the canonical interchange format for the Timepoint Suite (Flash, Pro, Clockchain, SNAG-Bench, Proteus). Export via API:timepoint-tdf package:
JSONL (Training Format)
Prompt/completion pairs for fine-tuning: Enable in template:examples/sample_training_data.jsonl for complete examples from Portal mode simulations.
SQLite Export
Full simulation state in relational format:runs- Run metadataentities- Entity tensors and metadatatimepoints- Temporal structuredialogs- Conversation turnscausal_edges- Causal graph structureexposure_events- Knowledge propagation (M3)
Oxen.ai Auto-Upload
Automatic versioned dataset upload:export_ml_dataset=truein templateOXEN_API_KEYenvironment variable set- Run completes successfully
Model Licensing
CRITICAL: Not all open-source models allow unrestricted use of outputs for training data.License Matrix
| License | Models | Training Data Status |
|---|---|---|
| MIT | DeepSeek Chat, DeepSeek R1 | ✅ Fully unrestricted—outputs can train any model |
| Apache 2.0 | Mistral 7B, Mixtral 8x7B, Mixtral 8x22B | ✅ Fully unrestricted—outputs can train any model |
| Llama | Llama 3.1 8B/70B/405B, Llama 4 Scout | ⚠️ Restricted—Meta’s license prohibits using Llama outputs to train non-Llama models |
| Qwen | Qwen 2.5 7B/72B, QwQ 32B | ✅ Permissive for most uses |
Default Behavior: M18 Filtering
The model selector (M18) automatically filters to training-safe models:for_training_data=True:
- Llama models excluded (license restricts training non-Llama models)
- Only MIT and Apache-2.0 licensed models used
- Oxen.ai upload uses this filter automatically
Check Training-Safe Models
Explicitly Use Training-Safe Models
In CLI:License Implications
If using Llama outputs:- ✅ Can fine-tune Llama models (same family)
- ❌ Cannot fine-tune Qwen, Mistral, DeepSeek, or custom models
- ❌ Cannot upload to public datasets (e.g., Hugging Face)
- ✅ Can fine-tune any model
- ✅ Can upload to public datasets
- ✅ Can use commercially without restrictions
Training Data Quality
Why SNAG Data is Superior
Standard training data:- ✅ Quantitative emotional state
- ✅ Knowledge provenance (who told them, when, confidence)
- ✅ Causal history leading to this moment
- ✅ Relationship dynamics
- ✅ Character arc (past failures influencing tactics)
- ✅ Circadian and atmospheric context
- ✅ Temporal mode constraints (Portal backward reasoning)
Data Diversity
Generate diverse training sets using: Branching mode:Use Cases
Fine-Tuning Causal Reasoning Models
Portal mode data trains models to reason backward from outcomes:Fine-Tuning Roleplay Models
Dialog with archetype profiles trains character consistency:Fine-Tuning Multi-Agent Models
Branching mode trains models to predict divergent outcomes:Diffusion Model Conditioning
Future use case: Train diffusion models conditioned on temporal causal graphs:Best Practices
Balance Quality and Quantity
High-quality (expensive):Filter by Mechanism
Generate data targeting specific capabilities:Validate Data Quality
Run convergence tests to verify data stability:Version Control with Oxen
Use Oxen.ai to track dataset lineage:Data Privacy
Local-only by default:- All data stays in
metadata/runs.db - No external services called unless explicitly configured
- Requires
OXEN_API_KEYset - Only uploads when
export_ml_dataset=true
Next Steps
- Learn about Cost Optimization to balance training data quality and cost
- Read Validation to understand data quality checks
- Explore Templates to configure training data export settings

