Skip to main content

Prerequisites

No dependencies required. Game Grammar is built entirely in pure Python with a custom autograd implementation. No PyTorch, TensorFlow, or external frameworks needed. Requirements:
  • Python 3.10 or higher
  • That’s it
The training process runs on CPU and takes approximately 30-40 minutes for 5000 steps. If you’re impatient, you can reduce num_steps in scripts/train.py to 1000 for a quick test.

Installation

Clone the repository:
git clone https://github.com/asavschaeffer/GG.git
cd GG
No pip install needed — the pure Python implementation has zero dependencies.

Three-Step Workflow

1

Generate gameplay episodes

Run the agent mix to produce tokenized gameplay traces. This generates 200 episodes from a mix of three agent types:
  • Random (40%): Chooses legal moves randomly
  • Greedy (40%): Always moves toward food
  • WallFollower (20%): Follows walls systematically
python scripts/generate.py
What this does:
# From scripts/generate.py
agent_mix = [
    (RandomAgent(seed=1), 0.4),
    (GreedyAgent(seed=2), 0.4),
    (WallFollowerAgent(10, 10, seed=3), 0.2),
]

codec = EventCodec(snapshot_interval=16)
episodes = collect_episodes(
    n=200,
    agent_mix=agent_mix,
    codec=codec,
    seed=42,
)
Expected output:
Generating 200 episodes...
Episodes: 200
Token lengths: min=42, max=512, avg=178

--- Sample episode (first 80 tokens) ---
BOS SNAP PLAYER X5 Y5 DIR_R LEN1 FOOD X8 Y6 SCORE V0 TICK INPUT_L ...

Saved to episodes.json
This creates episodes.json with 200 tokenized gameplay sequences. Each episode is a list of token IDs from the 74-token vocabulary.
The hybrid tokenizer uses periodic snapshots (SNAP) every 16 ticks plus delta events (TICK) for efficient encoding. Think of it like video compression: keyframes + deltas.
2

Train the transformer

Train a 2-layer, 32-dim, 4-head causal transformer on the generated episodes. The model learns game grammar through next-token prediction.
python scripts/train.py
What this does:
# From scripts/train.py
model = GameGPT(
    vocab_size=74,
    n_layer=2,
    n_embd=32,
    block_size=64,
    n_head=4,
    seed=42,
)

num_steps = 5000
for step in range(num_steps):
    # Sample random episode chunk
    ep = rng.choice(episodes)
    if len(ep) > model.block_size + 1:
        start = rng.randint(0, len(ep) - model.block_size - 1)
        tokens = ep[start:start + model.block_size + 1]
    else:
        tokens = ep
    
    loss = model.train_step(tokens, lr=0.01)
Expected output:
Loaded 200 episodes
Model params: 31234
step     1 / 5000 | loss 4.4712
step   100 / 5000 | loss 3.2145
step   200 / 5000 | loss 2.8934
step   300 / 5000 | loss 2.1023
...
step  4900 / 5000 | loss 0.2734
step  5000 / 5000 | loss 0.2512

Weights saved to weights.txt
The loss should decrease from ~4.47 (random baseline: ln(74) ≈ 4.3) to ~0.25. Training takes approximately 30-40 minutes on CPU.
The custom autograd implementation is educational but slow. For production use, consider reimplementing in PyTorch or JAX. The architecture and training logic are framework-agnostic.
3

Sample and validate sequences

Generate novel gameplay sequences and validate them against physical and rule constraints. The model should produce valid Snake gameplay it has never seen before.
python scripts/sample.py
What this does:
# From scripts/sample.py
bos_id = VOCAB["BOS"]
eos_id = VOCAB["EOS"]

samples = []
for i in range(20):
    tokens = model.sample(bos_id, eos_id, temperature=0.5)
    samples.append(tokens)
    
    # Check validity tiers
    s_pass = "S" if check_structural(tokens)["structural_pass"] else "-"
    p_pass = "P" if check_physical(tokens)["physical_pass"] else "-"
    r_pass = "R" if check_rules(tokens)["rule_pass"] else "-"
    print(f"[{s_pass}{p_pass}{r_pass}] sample {i+1:2d} ...")
Expected output:
Model loaded.

[-PR] sample  1 ( 64 tok): BOS SNAP PLAYER X5 Y5 DIR_R LEN1 FOOD X8 Y6 ...
[-PR] sample  2 ( 64 tok): BOS SNAP PLAYER X3 Y4 DIR_D LEN1 FOOD X7 Y2 ...
[SPR] sample  3 ( 48 tok): BOS SNAP PLAYER X6 Y6 DIR_L LEN1 FOOD X2 Y3 ...
[-PR] sample  4 ( 64 tok): BOS SNAP PLAYER X4 Y5 DIR_U LEN1 FOOD X9 Y1 ...
...

--- Validity rates ---
  structural  : 45%
  physical    : 95%
  rule        : 100%
The validation system checks three tiers:
  • Structural (S): Complete BOS→EOS episodes
  • Physical (P): Adjacent moves, in-bounds positions
  • Rule (R): EAT→GROW+FOOD_SPAWN, DIE→EOS
Low structural validity (45%) is expected — the model often generates gameplay longer than the 64-token context window, hitting the limit mid-sequence without producing EOS. The gameplay itself is still valid.

Understanding the Results

After running all three scripts, you should see: Physical validity: 95% — The model learned that:
  • Movement is one cell per tick (no jumping)
  • Positions must be within the 10×10 grid
  • The snake can only move to adjacent cells
Rule validity: 100% — The model learned that:
  • Eating food triggers growth (EAT → GROW)
  • Eating food triggers food respawn (EAT → FOOD_SPAWN)
  • Death ends the episode (DIE_WALL or DIE_SELF → EOS)
  • Score increments after eating (EAT → SCORE increment)
No explicit rules were programmed. The model discovered these patterns purely from observing token sequences.

What’s in Each File?

After running the workflow, you’ll have:

episodes.json

Tokenized gameplay traces from 200 episodes. Each episode is a list of token IDs:
[
  [0, 3, 4, 18, 28, 8, 32, 5, 21, 29, 42, 45, 2, 12, 38, ...],
  [0, 3, 4, 16, 26, 10, 33, 5, 23, 31, 42, 44, 2, 13, 39, ...],
  ...
]

weights.txt

31K trained parameters for the transformer. Human-readable text format:
wte
74 32
-0.0234 0.1023 -0.0512 ...
...
wpe
64 32
0.0123 -0.0456 0.0789 ...
...

Next Steps

Explore the Theory

Learn about the Wittgensteinian foundation

Tokenization Strategies

Deep dive into the five encoding approaches

Transformer Architecture

Understand the model implementation

API Reference

Read the API documentation

Troubleshooting

The pure Python autograd is educational but slow. To speed up training:
  1. Reduce num_steps in scripts/train.py from 5000 to 1000-2000
  2. Reduce the number of episodes in scripts/generate.py from 200 to 50-100
  3. For production use, consider reimplementing in PyTorch or JAX
The architecture is framework-agnostic — you can port the model definition directly.
Make sure you’re running the scripts from the repository root:
cd GG  # Repository root
python scripts/generate.py
The scripts add the parent directory to sys.path to import the game_grammar package.
If loss stays around 4.3 (random baseline):
  1. Check that episodes.json exists and contains data
  2. Verify the learning rate (default: 0.01)
  3. Try increasing training steps
  4. Check for numerical instability (NaNs in weights)
The loss should decrease steadily from ~4.47 to ~0.25 over 5000 steps.
Expected validity rates after 5000 training steps:
  • Physical: 90-95%
  • Rule: 95-100%
  • Structural: 40-50% (limited by context window)
If rates are significantly lower:
  1. Train for more steps
  2. Increase the number of training episodes
  3. Check that the model architecture matches the pretrained weights

Build docs developers (and LLMs) love