Overview
Game Grammar uses a hybrid snapshot+delta encoding strategy, inspired by video codecs:- Snapshots (I-frames): Periodic full state captures for context recovery
- Deltas (P-frames): High-frequency event tokens capturing state changes
- Compression: Delta events are compact
- Recovery: Snapshots prevent error accumulation
- Learnability: Transformers can attend to both absolute state and relative changes
From
codec.py:21-23, the encoder takes snapshot_interval=16 and salience_threshold=Salience.TICK as parameters.74-Token Vocabulary
The complete vocabulary fromvocab.py covers all Snake gameplay:
Structural Tokens (4)
BOS: Begin sequence (start of episode)EOS: End sequence (episode terminated)TICK: Time step marker (bundles events)SNAP: Snapshot marker (full state follows)
vocab.py:6.
Entity Tokens (3)
"GHOST", "PELLET", "POWER_PELLET", etc.
From vocab.py:9.
Direction Tokens (4)
vocab.py:12.
Input Tokens (4)
vocab.py:15.
Position Tokens (20)
vocab.py:18-21.
Event Type Tokens (7)
vocab.py:24.
Value Tokens (11)
V10.
From vocab.py:27.
Length Tokens (21)
LEN_LONG for lengths > 20.
From vocab.py:30-31.
Total: 74 Tokens
vocab.py:37. This is significantly larger than microgpt’s ~27 character vocabulary, but much smaller than language models (50K+ tokens).
Encoding Strategy
TheEventCodec class in codec.py implements the hybrid approach.
Snapshot Encoding
A snapshot captures full state at a point in time:codec.py:25-38.
Example snapshot: SNAP PLAYER X5 Y5 DIR_R LEN1 FOOD X8 Y6 SCORE V0
This is 11 tokens encoding:
- Player position (X5, Y5)
- Player direction (DIR_R)
- Snake length (LEN1)
- Food position (X8, Y6)
- Current score (V0)
Delta Event Encoding
Delta events encode state changes:codec.py:40-69.
Examples:
INPUT_R→ 1 tokenMOVE X7 Y8→ 3 tokensEAT→ 1 tokenGROW LEN3→ 2 tokensFOOD_SPAWN X2 Y4→ 3 tokens
Tick Bundling
Events are grouped by tick:codec.py:71-78.
All events with the same tick are encoded after a single TICK token:
Episode Encoding
Complete episodes are wrapped inBOS and EOS:
codec.py:80-107.
Why Keyframe + Delta?
This hybrid approach mirrors video codec design (I-frames + P-frames):Compression
Compression
Delta events are compact.
MOVE X7 Y8 is 3 tokens, while a full state snapshot is 11 tokens.Pure delta encoding would be even more compact, but has problems (see next).Error Recovery
Error Recovery
Pure delta encoding accumulates errors. If the model predicts one wrong move, all future positions are offset.Periodic snapshots reset the state, allowing the model to recover from mistakes.
Long-Range Context
Long-Range Context
Snapshots give the transformer absolute position context. Without them, the model must integrate all deltas from the start of the sequence.With a 64-token context window, the model can only “see” ~4-8 game ticks back. Snapshots provide grounding.
Conditional Rules
Conditional Rules
Some events trigger snapshots immediately. From
codec.py:92-97, any event with salience >= Salience.RULE_EFFECT forces a snapshot.This ensures the model sees the full state after important transitions (eating food, dying, etc.).Snapshot interval is configurable (
snapshot_interval=16 in the current implementation). More snapshots = more context, fewer snapshots = better compression.Real Encoded Sequence Example
From the README, a real Snake game encodes as:- Initial snapshot at tick 0
- Delta events for each move
- Snapshot after eating (rule effect)
- Death event followed by
EOS
Decoding
The codec includes a decoder for validation:codec.py:109-179.
The decoder parses multi-token events back into structured records for the validator to check.
Vocabulary Design Trade-offs
- Current Approach (74 tokens)
- Alternative: Binary Positions
- Alternative: Relative Deltas
Pros:
- Absolute positions as single tokens (
X7,Y8) - Clear event structure
- Human-readable sequences
- Small vocabulary size
- Grid size is baked in (10x10)
- Scaling to larger grids requires more tokens
- Each game needs a custom vocabulary
Hyperparameters
Fromcodec.py:21-23:
snapshot_interval=16: Insert snapshot every 16 ticks (+ on rule effects)salience_threshold=Salience.TICK: Include all events (no filtering)
snapshot_interval = more compression, less context.
Next Steps
Event Streams
See how games produce events
Transformer Architecture
Learn how the model processes tokens
