Skip to main content
Timepoint Pro provides built-in evaluation metrics to validate simulation quality and consistency.

Running Evaluation

python cli.py mode=evaluate
Evaluates all entities currently stored in the database.

Core Metrics

Three primary metrics assess simulation quality:

Temporal Coherence

Consistency of entities across timepoints

Knowledge Consistency

Information conservation compliance

Biological Plausibility

Constraint enforcement validation

Temporal Coherence Score

Measures behavioral consistency across timepoints.

Formula

violations = 0
for each consecutive timepoint pair:
    if personality_traits_changed_significantly:
        violations += 1

score = 1.0 - (violations / num_transitions)

What It Validates

Personality traits should remain stable over timeChecks:
  • Personality trait consistency
  • Character arc plausibility
  • No sudden personality shifts
Example violations:
  • Cautious character becomes reckless overnight
  • Reserved person suddenly becomes extroverted
  • Core values change without cause
Core characteristics persist unless causally justifiedChecks:
  • Trait stability across timepoints
  • Gradual vs. sudden changes
  • Causal explanations for shifts

Score Interpretation

Perfect CoherenceNo behavioral violations detected. Entities maintain consistent personalities across all timepoints.

Knowledge Consistency Score

Validates information conservation - entities can only know what they’ve been exposed to.

Formula

if knowledge_properly_sourced:
    score = 1.0
else:
    score = 0.0

What It Validates

Every knowledge item must have a sourceChecks:
  • All knowledge has recorded exposure event
  • Source entity or event exists
  • Timestamp is causally valid
Example violations:
  • Entity knows information without witnessing it
  • Knowledge appears without source
  • Anachronistic information (knows future events)
Knowledge spreads through valid pathsChecks:
  • Information flows along relationship edges
  • No spontaneous knowledge generation
  • Social network constraints respected
Example violations:
  • Entity knows secrets without connection to source
  • Information spreads faster than possible
  • Knowledge crosses disconnected graph components
Knowledge can only come from past eventsChecks:
  • Exposure timestamp < current timepoint
  • No future information leak
  • Proper causal chain
Example violations:
  • Entity knows outcome before it happens
  • Future information influences past decisions
  • Causal chain broken

Score Interpretation

ValidAll knowledge properly sourced. No information conservation violations.

Biological Plausibility Score

Measures constraint enforcement and physical/resource realism.

Formula

violations = 0
for each action:
    if violates_constraints:
        violations += 1

score = 1.0 - (violations / num_actions)

What It Validates

Actions respect physical limitationsChecks:
  • Movement speed plausible
  • Energy expenditure realistic
  • Physical capabilities within human range
Example violations:
  • Entity travels impossible distance in timespan
  • Action requires more energy than available
  • Superhuman abilities without justification
Actions consume appropriate resourcesChecks:
  • Energy budget tracking
  • Resource availability
  • Consumption rates
Example violations:
  • Entity acts without sufficient energy
  • Resource consumption exceeds supply
  • Negative resource balances
Physical and emotional states influence behaviorChecks:
  • Fatigue affects performance
  • Stress influences decisions
  • Physiological needs matter
Example violations:
  • Exhausted entity performs at peak
  • Emotional state ignored in decision-making
  • Physical needs not reflected in behavior

Score Interpretation

Fully PlausibleNo constraint violations. All actions respect physical and resource limitations.

Example Output

Evaluating 5 entities:

  george_washington:
    Temporal Coherence:      0.95
    Knowledge Consistency:   1.00
    Biological Plausibility: 0.92

  john_adams:
    Temporal Coherence:      0.88
    Knowledge Consistency:   1.00
    Biological Plausibility: 0.87

  thomas_jefferson:
    Temporal Coherence:      0.91
    Knowledge Consistency:   1.00
    Biological Plausibility: 0.89

  alexander_hamilton:
    Temporal Coherence:      0.93
    Knowledge Consistency:   1.00
    Biological Plausibility: 0.94

  james_madison:
    Temporal Coherence:      0.87
    Knowledge Consistency:   1.00
    Biological Plausibility: 0.85

Resolution Distribution:
  SCENE: 3 entities
  DIALOG: 2 entities

Cost: $0.00 (evaluation uses cached data)
Tokens: 0

Resolution Distribution

Evaluation also reports entity resolution levels:
Minimal detail, compressed representation only~200 tokens per entity

Generated Reports

Evaluation generates two report files:
{
  "entities_evaluated": 5,
  "resolution_distribution": {
    "SCENE": 3,
    "DIALOG": 2
  },
  "cost": 0.00,
  "tokens": 0,
  "timestamp": "2024-12-07T12:34:56"
}

Validation Integration

Evaluation metrics use the same validators as training:
  • validate_behavioral_inertia() - Temporal coherence
  • validate_information_conservation() - Knowledge consistency
  • validate_biological_constraints() - Biological plausibility
See Validation for implementation details.

When to Evaluate

Run evaluation after:
1

Training

After mode=train or mode=temporal_train to validate entity quality
2

Simulation

After running templates with ./run.sh to check consistency
3

Debugging

When investigating unexpected entity behavior
4

Before Export

Before exporting data to ensure quality

Next Steps

Interactive Queries

Query your evaluated entities

Training

Improve entity quality with better training

Validation

Learn about validation system

CLI Overview

Back to CLI overview

Build docs developers (and LLMs) love