Skip to main content

Meaning is Use

The meaning of a word is its use in the language. — Wittgenstein, Philosophical Investigations §43
State has no intrinsic meaning. Meaning arises only through events — a coin is the thing that increments your score when you touch it. Entities are defined solely by their behavior under collision. The structure of what can follow what is the game’s grammar. A causal transformer learns this grammar by predicting the next event token, the same way a language model learns syntax. From this single objective it learns physical regularities, rule mappings, temporal dependencies, and long-horizon behavioral patterns.

Why Events vs State

Traditional game representations focus on static state: “the player is at position (5,5), the food is at (8,6), the score is 3.” This approach has a fundamental problem: state has no intrinsic meaning. Game Grammar takes a different approach inspired by Wittgenstein’s philosophy of language:
Collision-defined semantics: Entities are defined by what happens when they interact, not by what they “are”
In Snake:
  • Food is “the thing that triggers EAT→GROW→FOOD_SPAWN when the player touches it”
  • Walls are “the things that trigger DIE_WALL when touched”
  • The player is “the entity that moves, eats, grows, and dies”

Event-Driven Representation

Gameplay is atomized into discrete events — movements, collisions, collections, deaths — and tokenized into a sequence. Each event captures a state transition, not a snapshot of state.

Real Example from Snake

Here’s how a single game step produces multiple events in snake.py:26-128:
def step(self, action: Action) -> tuple[SnakeState, list[Event], bool]:
    s = self._state
    events = []
    tick = s.tick + 1
    
    # Input event
    events.append(Event(
        type=f"INPUT_{action.value[0]}",
        entity="player",
        payload={"action": action.value},
        tick=tick,
        salience=Salience.MOVEMENT,
    ))
    
    # Movement event
    events.append(Event(
        type="MOVE", entity="player",
        payload={"pos": new_head},
        tick=tick, salience=Salience.MOVEMENT,
    ))
    
    # If food was eaten...
    if ate:
        events.append(Event(
            type="EAT", entity="player",
            payload={"pos": new_head},
            tick=tick, salience=Salience.RULE_EFFECT,
        ))
        events.append(Event(
            type="GROW", entity="player",
            payload={"length": len(new_body)},
            tick=tick, salience=Salience.RULE_EFFECT,
        ))
        events.append(Event(
            type="FOOD_SPAWN", entity="food",
            payload={"pos": new_food},
            tick=tick, salience=Salience.RULE_EFFECT,
        ))
A single action produces a causal chain of events: INPUT_R → MOVE → EAT → GROW → FOOD_SPAWN → SCORE.

Grammar as Structure

The grammar of a game is the structure of valid event sequences. Not all sequences are legal:
  • MOVE must be followed by position change to an adjacent cell
  • EAT must trigger GROW and FOOD_SPAWN
  • DIE_WALL or DIE_SELF must be followed by EOS (end of sequence)
  • Movement can’t jump across the grid or go out of bounds
The transformer doesn’t learn these rules from explicit constraints — it learns them from observing valid gameplay sequences.

Tokenized Event Sequence

From the README example, a real Snake game becomes:
BOS SNAP PLAYER X5 Y5 DIR_R LEN1 FOOD X8 Y6 SCORE V0
  TICK INPUT_L MOVE X4 Y5
  TICK INPUT_D MOVE X4 Y6
  TICK INPUT_D MOVE X4 Y7
  TICK INPUT_R MOVE X5 Y7
  TICK INPUT_R MOVE X6 Y7
  ...
This is not a description of state — it’s a trace of what happened. The model learns:
  1. Physical regularities: Movement is always one cell per tick, positions stay in bounds
  2. Rule mappings: EAT always triggers GROW and FOOD_SPAWN
  3. Temporal dependencies: Death events terminate the sequence
  4. Behavioral patterns: Different agents produce different event patterns

Learning from Event Grammar

A causal transformer trained on next-token prediction learns the rules, physics, and behavioral patterns of the game from the event sequence alone. The model learns:
  • Moves are always to adjacent cells
  • Positions stay within bounds
  • Direction changes are legal (no instant reversals)
  • From README: “The model learned that movement is one cell per tick”
  • EAT always triggers GROW and FOOD_SPAWN
  • DIE_WALL or DIE_SELF always followed by EOS
  • Score increments on food collection
  • From README: “eating triggers growth and food respawn, and that death ends the episode”
  • Player archetypes as statistical regularities
  • Wall followers vs. food chasers vs. space fillers
  • No explicit labels — patterns emerge from the grammar

Comparison to Other Approaches

Unlike MuZero (which learns what a player should do), Game Grammar learns what players actually do — and the output is readable by construction. The event stream is:
  • Interpretable: You can read the token sequence and understand what happened
  • Compositional: Events combine to form patterns
  • Generalizable: The same tokenization approach works for any game
  • Debuggable: Invalid sequences can be traced to specific rule violations

Pipeline Overview

Game (any) → Event Stream → Token Sequence → Transformer → Grammar
From the README:
“For now, the game never touches the model. The tokenization layer is where game-agnosticism lives. In the long run we will feed the snake its own tail.”
The separation of concerns is clean:
  1. Game implements the rules and emits events
  2. Codec translates events into tokens
  3. Transformer learns patterns in token sequences
  4. Validator checks that generated sequences obey the grammar

Next Steps

Event Streams

Learn how games emit structured events

Tokenization

See how events become token sequences

Build docs developers (and LLMs) love