The Core Insight
Entities shouldn’t magically know things. Every piece of knowledge should have a traceable origin—who learned what, from whom, when, with what confidence. Key principle:entity.knowledge_state ⊆ {e.information for e in entity.exposure_events}
An entity cannot know something without a recorded exposure event explaining how they learned it.
M3: Exposure Event Tracking
Knowledge acquisition is logged as exposure events.Data Structure
schemas.py:277-286
Event Types
| Type | Description | Example |
|---|---|---|
| witnessed | Direct observation | Seeing a meeting happen |
| learned | Formal instruction | Training session |
| told | Communicated by another entity | Gossip, reports |
| experienced | Personal involvement | Participating in an event |
Validation Constraint
Fromvalidation.py:63-94:
Causal Audit Trail
Exposure events form a DAG (Directed Acyclic Graph):- Nodes: Information items
- Edges: Causal relationships (who learned from whom)
- Validates information accessibility
- Enables counterfactual reasoning (“if Jefferson hadn’t received that letter…”)
- Supports temporal consistency checks
M4: Constraint Enforcement
Five validators enforce consistency using conservation-law metaphors.1. Information Conservation (Shannon Entropy)
Law: Knowledge state cannot exceed exposure history. Implementation above in M3 section.2. Energy Budget (Thermodynamic)
Entities have bounded cognitive/physical energy per timepoint. Fromvalidation.py:98-137:
3. Behavioral Inertia
Personality traits persist; sudden changes require justification. Fromvalidation.py:140-160:
4. Biological Constraints
Physical limitations (illness, fatigue, location) constrain behavior. Fromvalidation.py:163-189:
5. Network Flow
Information propagation respects relationship topology. Entities can only share knowledge if they have a relationship path. Knowledge doesn’t teleport across disconnected subgraphs.Castaway Colony Example
Constraint enforcement blocks invalid states:- Engineer can’t repair the beacon without the power coupling from the debris field
- Nobody survives outside during radiation storms
- Fatigue accumulates, limiting physical labor capacity
M19: Knowledge Extraction Agent
The problem: Naive approaches to extracting knowledge from dialog produce garbage.The Old Problem (Pre-M19)
- Sentence-initial words
- Contractions
- Common words
- Names without context
The M19 Solution
An LLM-based Knowledge Extraction Agent that understands semantic meaning. Fromworkflows/knowledge_extraction.py:1-22:
Data Structure
schemas.py:455-473
What Gets Extracted
Good extractions (complete semantic units):- “Michael believes the project deadline is unrealistic”
- “The board approved the $2M budget increase”
- “Sarah revealed that the prototype failed last week”
- “They agreed to postpone the launch until Q3”
- Greetings: “Hello”, “Thanks”, “Good morning”
- Contractions: “We’ll”, “I’ve”, “That’s”
- Single names without context: “Michael”, “Sarah”
- Filler words: “What”, “Well”, “Actually”
Knowledge Categories
| Category | Description | Example | |----------|-------------|---------|| | fact | Verifiable information | “The meeting is at 3pm” | | decision | Choice communicated | “We decided to pivot to B2B” | | opinion | Subjective view | “I think the design needs work” | | plan | Intended future action | “We’ll launch in March” | | revelation | New information changing understanding | “The competitor already filed the patent” | | question | Only if reveals information itself | “Did you know about the acquisition?” | | agreement | Consensus reached | “We all agree on the pricing” |RAG-Aware Prompting
The agent receives causal context from existing exposure events to:- Avoid redundant extraction - Don’t store facts already in the system
- Recognize novel information - New facts worth storing
- Understand relationships - How new knowledge connects to existing
Integration with Dialog Synthesis (M11)
M19 is called automatically during dialog synthesis. Fromworkflows/dialog_synthesis.py (conceptual flow):
Model Selection
Knowledge extraction uses M18 model selection with specific requirements:Extraction Response Structure
workflows/knowledge_extraction.py:61-79
JSON Extraction Robustness
Fromworkflows/knowledge_extraction.py:87-156:
Cleanup Script
For simulations with old garbage exposure events:Performance Characteristics
Validation Complexity
O(n) for n validators using:- Set operations (information conservation)
- Vector norms (behavioral inertia)
- Threshold checks (energy budget, biological constraints)
Exposure Event Storage
SQLite with indexes on:entity_id(queries by entity)timepoint_id(queries by timepoint)run_id(convergence analysis)
- 1000 exposure events: under 10ms query time
- 10,000 exposure events: under 50ms query time
Knowledge Extraction Cost
M19 agent cost per dialog:- Input: ~1,500 tokens (dialog turns + causal context)
- Output: ~500 tokens (structured knowledge items)
- Models: Qwen 2.5 72B, Llama 70B, DeepSeek Chat
- Cost: ~$0.005 per dialog
Next Steps
Entity Simulation
Dialog synthesis, prospection, animism
Infrastructure
M18 model selection and routing

