Differential Testing

This page documents the differential testing workflow used to verify behavioral parity between the original Crimsonland binary and the Python reimplementation.

Overview

Differential testing ensures the rewrite matches the original by:

Capturing a gameplay session from the original with Frida
Replaying the same inputs in the rewrite (headless)
Comparing state checkpoints tick-by-tick
Reporting divergences with field-level granularity

Workflow

Capture Original Run

Instrument the original game to record inputs and state:

# Attach Frida capture script
frida -n crimsonland.exe -l scripts/frida/gameplay_diff_capture.js

Captures:

Input events (keyboard, mouse)
RNG seed and call sequence
State snapshots every N ticks
Final score, kills, time

Output: gameplay_diff_capture.json

Replay in Rewrite

Run the rewrite in headless mode with captured inputs:

uv run crimson replay verify gameplay_diff_capture.json

The rewrite:

Seeds RNG with captured seed
Feeds inputs from capture file
Steps simulation tick-by-tick
Generates state checkpoints

Compare State

The verifier compares each checkpoint field-by-field:

def compare_checkpoint(expected, actual, tick):
    for field in checkpoint_fields:
        if abs(expected[field] - actual[field]) > tolerance:
            report_divergence(tick, field, expected[field], actual[field])

Tolerance: 1e-5 for floats, exact match for integers.

Fix Divergences

When a divergence is found:

Identify the first divergent tick
Inspect the divergent field (position, health, RNG count)
Trace back to the function that writes that field
Compare decompiled logic with rewrite implementation
Fix the rewrite and re-verify

Capture Format

The differential capture file contains:

{
  "metadata": {
    "version": "1.0",
    "mode": "survival",
    "seed": 12345,
    "tick_rate": 60,
    "duration_ticks": 3600
  },
  "inputs": [
    {"tick": 0, "keys": ["W"], "mouse": [400, 300]},
    {"tick": 5, "keys": ["W", "LMB"], "mouse": [450, 280]},
    ...
  ],
  "checkpoints": [
    {
      "tick": 100,
      "player": {
        "health": 100.0,
        "pos_x": 432.5,
        "pos_y": 300.0,
        "weapon_id": 1,
        "ammo": 12.0
      },
      "creatures": [
        {"index": 0, "type": 3, "health": 20.0, "pos_x": 500.0},
        {"index": 5, "type": 5, "health": 50.0, "pos_x": 600.0}
      ],
      "projectiles": [
        {"index": 0, "type": 1, "pos_x": 440.0, "life_timer": 0.3}
      ],
      "rng_calls": 234
    },
    ...
  ],
  "final": {
    "score": 15000,
    "kills": 120,
    "time": 180.5
  }
}

Checkpoint Fields

Player State

player_fields = [
    "health",
    "pos_x", "pos_y",
    "weapon_id",
    "ammo",
    "experience",
    "level",
    "fire_bullets_timer",
    "shield_timer"
]

Creature Pool

creature_fields = [
    "active",
    "type_id",
    "health",
    "pos_x", "pos_y",
    "vel_x", "vel_y"
]

Projectile Pool

projectile_fields = [
    "active",
    "type_id",
    "pos_x", "pos_y",
    "life_timer",
    "owner_id"
]

Global Counters

global_fields = [
    "rng_calls",      # Total rand() invocations
    "tick_counter",   # Simulation tick
    "kill_count",
    "score"
]

Divergence Analysis

Example Report

=== DIVERGENCE DETECTED ===
Tick: 347
Field: player[0].pos_x
Expected: 432.500000
Actual:   432.501007
Delta:    0.001007

RNG call count:
  Expected: 1204
  Actual:   1205
  Delta:    +1 (extra call)

Root Cause Process

Identify First Divergence
- Tick 347, pos_x differs by 0.001
- RNG call count differs (+1 call)
Trace RNG Call
- Extra RNG call between tick 346 and 347
- Search rewrite for rand() calls in player/creature/projectile update

Find the Culprit

# Rewrite has extra rand() call:
if random.random() < 0.1:  # WRONG: extra RNG call
    spawn_particle()

# Original uses pre-rolled dice:
if particle_spawn_dice > 0.9:  # Rolled once per frame
    spawn_particle()

Fix and Re-verify

uv run crimson replay verify capture.json
# PASS: All 3600 ticks match

Test Coverage

Mode Coverage

Survival

Full parity across 1000+ tick runs

Rush

Verified spawn timing and wave logic

Quests

All 90 quest levels verified

Tutorial

Scripted sequence matches original

Subsystem Coverage

Player movement and combat
Creature AI and pathfinding
Projectile physics and collision
Weapon fire rate and reload
Perk effects and stacking
Bonus spawn and timers
Experience and leveling
Score calculation

Automated Tests

The test suite includes differential replay tests:

def test_survival_parity_1000_ticks(capture_fixture):
    """Verify 1000 tick Survival run matches original."""
    result = replay_runner.verify_checkpoints(capture_fixture)
    
    assert result.all_fields_match
    assert result.rng_call_count_match
    assert result.final_score_match

def test_quest_1_1_complete(quest_1_1_capture):
    """Verify Quest 1-1 completion matches original."""
    result = replay_runner.verify_checkpoints(quest_1_1_capture)
    
    assert result.quest_complete
    assert result.time_match
    assert result.kills_match

Run:

uv run pytest tests/parity/

Capture Guidelines

Deterministic Captures

For reproducible verification:

Use fixed seed
```
seed = 12345
random.seed(seed)
```
Record full input state
- Every key press/release
- Mouse position every frame
- Timestamp or tick number
Checkpoint frequently
- Every 10-100 ticks
- After major events (level up, weapon pickup)
Capture metadata
- Game version/build
- Mode and difficulty
- Player config (keybinds, resolution)

Quest-Specific Captures

Quest mode requires per-stage files:

gameplay_diff_capture.quest_1_0.json  # Quest 1-0
gameplay_diff_capture.quest_1_1.json  # Quest 1-1
...
gameplay_diff_capture.quest_9_9.json  # Quest 9-9

Each file contains:

Quest-specific spawn scripts
Stage completion criteria
Expected final stats

Headless Simulation

The rewrite’s headless mode:

def step_headless(world_state, input_frame):
    """Single deterministic simulation step."""
    # Apply inputs
    world_state.player.update_input(input_frame)
    
    # Step subsystems in fixed order
    player_update(world_state, delta_time)
    creature_update(world_state, delta_time)
    projectile_update(world_state, delta_time)
    bonus_update(world_state, delta_time)
    
    # Capture checkpoint
    checkpoint = create_checkpoint(world_state)
    
    return checkpoint

No rendering, audio, or timing jitter - pure deterministic simulation.

Float Precision Handling

Float comparison uses epsilon tolerance:

def float_equal(a, b, epsilon=1e-5):
    return abs(a - b) < epsilon

Why: x87 FPU rounding and Python float64 → float32 conversions introduce tiny errors. Strict mode: For critical fields (health, ammo), use epsilon=0 (exact match).

CI Integration

Differential tests run on every commit:

# .github/workflows/parity.yml
name: Parity Tests
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Run parity tests
        run: |
          uv run pytest tests/parity/ --capture-dir=test_fixtures/captures/

Fast feedback loop - catch regressions immediately.

Frida Capture

Capturing differential testing inputs

Replay System

Deterministic replay architecture

Float Parity Policy

Float32 precision contracts

Overview

Static Analysis

Runtime Analysis

File Formats

Data Structures

Overview

Workflow

Capture Format

Checkpoint Fields

Player State

Creature Pool

Projectile Pool

Global Counters

Divergence Analysis

Example Report

Root Cause Process

Test Coverage

Mode Coverage

Survival

Rush

Quests

Tutorial

Subsystem Coverage

Automated Tests

Capture Guidelines

Deterministic Captures

Quest-Specific Captures

Headless Simulation

Float Precision Handling

CI Integration

Frida Capture

Replay System

Float Parity Policy

Build docs developers (and LLMs) love

Overview

Static Analysis

Runtime Analysis

File Formats

Data Structures

​Overview

​Workflow

​Capture Format

​Checkpoint Fields

​Player State

​Creature Pool

​Projectile Pool

​Global Counters

​Divergence Analysis

​Example Report

​Root Cause Process

​Test Coverage

​Mode Coverage

Survival

Rush

Quests

Tutorial

​Subsystem Coverage

​Automated Tests

​Capture Guidelines

​Deterministic Captures

​Quest-Specific Captures

​Headless Simulation

​Float Precision Handling

​CI Integration

​Related Pages

Frida Capture

Replay System

Float Parity Policy

Build docs developers (and LLMs) love

Overview

Workflow

Capture Format

Checkpoint Fields

Player State

Creature Pool

Projectile Pool

Global Counters

Divergence Analysis

Example Report

Root Cause Process

Test Coverage

Mode Coverage

Subsystem Coverage

Automated Tests

Capture Guidelines

Deterministic Captures

Quest-Specific Captures

Headless Simulation

Float Precision Handling

CI Integration

Related Pages