Quick Start

Prerequisites

No dependencies required. Game Grammar is built entirely in pure Python with a custom autograd implementation. No PyTorch, TensorFlow, or external frameworks needed. Requirements:

Python 3.10 or higher
That’s it

The training process runs on CPU and takes approximately 30-40 minutes for 5000 steps. If you’re impatient, you can reduce num_steps in scripts/train.py to 1000 for a quick test.

Installation

Clone the repository:

git clone https://github.com/asavschaeffer/GG.git
cd GG

No pip install needed — the pure Python implementation has zero dependencies.

Three-Step Workflow

Generate gameplay episodes

Run the agent mix to produce tokenized gameplay traces. This generates 200 episodes from a mix of three agent types:

Random (40%): Chooses legal moves randomly
Greedy (40%): Always moves toward food
WallFollower (20%): Follows walls systematically

python scripts/generate.py

What this does:

# From scripts/generate.py
agent_mix = [
    (RandomAgent(seed=1), 0.4),
    (GreedyAgent(seed=2), 0.4),
    (WallFollowerAgent(10, 10, seed=3), 0.2),
]

codec = EventCodec(snapshot_interval=16)
episodes = collect_episodes(
    n=200,
    agent_mix=agent_mix,
    codec=codec,
    seed=42,
)

Expected output:

Generating 200 episodes...
Episodes: 200
Token lengths: min=42, max=512, avg=178

--- Sample episode (first 80 tokens) ---
BOS SNAP PLAYER X5 Y5 DIR_R LEN1 FOOD X8 Y6 SCORE V0 TICK INPUT_L ...

Saved to episodes.json

This creates episodes.json with 200 tokenized gameplay sequences. Each episode is a list of token IDs from the 74-token vocabulary.

The hybrid tokenizer uses periodic snapshots (SNAP) every 16 ticks plus delta events (TICK) for efficient encoding. Think of it like video compression: keyframes + deltas.

Train the transformer

Train a 2-layer, 32-dim, 4-head causal transformer on the generated episodes. The model learns game grammar through next-token prediction.

python scripts/train.py

What this does:

# From scripts/train.py
model = GameGPT(
    vocab_size=74,
    n_layer=2,
    n_embd=32,
    block_size=64,
    n_head=4,
    seed=42,
)

num_steps = 5000
for step in range(num_steps):
    # Sample random episode chunk
    ep = rng.choice(episodes)
    if len(ep) > model.block_size + 1:
        start = rng.randint(0, len(ep) - model.block_size - 1)
        tokens = ep[start:start + model.block_size + 1]
    else:
        tokens = ep
    
    loss = model.train_step(tokens, lr=0.01)

Expected output:

Loaded 200 episodes
Model params: 31234
step     1 / 5000 | loss 4.4712
step   100 / 5000 | loss 3.2145
step   200 / 5000 | loss 2.8934
step   300 / 5000 | loss 2.1023
...
step  4900 / 5000 | loss 0.2734
step  5000 / 5000 | loss 0.2512

Weights saved to weights.txt

The loss should decrease from ~4.47 (random baseline: ln(74) ≈ 4.3) to ~0.25. Training takes approximately 30-40 minutes on CPU.

The custom autograd implementation is educational but slow. For production use, consider reimplementing in PyTorch or JAX. The architecture and training logic are framework-agnostic.

Sample and validate sequences

Generate novel gameplay sequences and validate them against physical and rule constraints. The model should produce valid Snake gameplay it has never seen before.

python scripts/sample.py

What this does:

# From scripts/sample.py
bos_id = VOCAB["BOS"]
eos_id = VOCAB["EOS"]

samples = []
for i in range(20):
    tokens = model.sample(bos_id, eos_id, temperature=0.5)
    samples.append(tokens)
    
    # Check validity tiers
    s_pass = "S" if check_structural(tokens)["structural_pass"] else "-"
    p_pass = "P" if check_physical(tokens)["physical_pass"] else "-"
    r_pass = "R" if check_rules(tokens)["rule_pass"] else "-"
    print(f"[{s_pass}{p_pass}{r_pass}] sample {i+1:2d} ...")

Expected output:

Model loaded.

[-PR] sample  1 ( 64 tok): BOS SNAP PLAYER X5 Y5 DIR_R LEN1 FOOD X8 Y6 ...
[-PR] sample  2 ( 64 tok): BOS SNAP PLAYER X3 Y4 DIR_D LEN1 FOOD X7 Y2 ...
[SPR] sample  3 ( 48 tok): BOS SNAP PLAYER X6 Y6 DIR_L LEN1 FOOD X2 Y3 ...
[-PR] sample  4 ( 64 tok): BOS SNAP PLAYER X4 Y5 DIR_U LEN1 FOOD X9 Y1 ...
...

--- Validity rates ---
  structural  : 45%
  physical    : 95%
  rule        : 100%

The validation system checks three tiers:

Structural (S): Complete BOS→EOS episodes
Physical (P): Adjacent moves, in-bounds positions
Rule (R): EAT→GROW+FOOD_SPAWN, DIE→EOS

Low structural validity (45%) is expected — the model often generates gameplay longer than the 64-token context window, hitting the limit mid-sequence without producing EOS. The gameplay itself is still valid.

Understanding the Results

After running all three scripts, you should see: Physical validity: 95% — The model learned that:

Movement is one cell per tick (no jumping)
Positions must be within the 10×10 grid
The snake can only move to adjacent cells

Rule validity: 100% — The model learned that:

Eating food triggers growth (EAT → GROW)
Eating food triggers food respawn (EAT → FOOD_SPAWN)
Death ends the episode (DIE_WALL or DIE_SELF → EOS)
Score increments after eating (EAT → SCORE increment)

No explicit rules were programmed. The model discovered these patterns purely from observing token sequences.

What’s in Each File?

After running the workflow, you’ll have:

`episodes.json`

Tokenized gameplay traces from 200 episodes. Each episode is a list of token IDs:

[
  [0, 3, 4, 18, 28, 8, 32, 5, 21, 29, 42, 45, 2, 12, 38, ...],
  [0, 3, 4, 16, 26, 10, 33, 5, 23, 31, 42, 44, 2, 13, 39, ...],
  ...
]

`weights.txt`

31K trained parameters for the transformer. Human-readable text format:

wte
74 32
-0.0234 0.1023 -0.0512 ...
...
wpe
64 32
0.0123 -0.0456 0.0789 ...
...

Next Steps

Explore the Theory

Learn about the Wittgensteinian foundation

Tokenization Strategies

Deep dive into the five encoding approaches

Transformer Architecture

Understand the model implementation

API Reference

Read the API documentation

Troubleshooting

Training is too slow

The pure Python autograd is educational but slow. To speed up training:

Reduce num_steps in scripts/train.py from 5000 to 1000-2000
Reduce the number of episodes in scripts/generate.py from 200 to 50-100
For production use, consider reimplementing in PyTorch or JAX

The architecture is framework-agnostic — you can port the model definition directly.

ModuleNotFoundError: No module named 'game_grammar'

Make sure you’re running the scripts from the repository root:

cd GG  # Repository root
python scripts/generate.py

The scripts add the parent directory to sys.path to import the game_grammar package.

Loss is not decreasing

If loss stays around 4.3 (random baseline):

Check that episodes.json exists and contains data
Verify the learning rate (default: 0.01)
Try increasing training steps
Check for numerical instability (NaNs in weights)

The loss should decrease steadily from ~4.47 to ~0.25 over 5000 steps.

Low validity rates

Expected validity rates after 5000 training steps:

Physical: 90-95%
Rule: 95-100%
Structural: 40-50% (limited by context window)

If rates are significantly lower:

Train for more steps
Increase the number of training episodes
Check that the model architecture matches the pretrained weights

Overview

Concepts

Training

Games

Prerequisites

Installation

Three-Step Workflow

Understanding the Results

What’s in Each File?

`episodes.json`

`weights.txt`

Next Steps

Explore the Theory

Tokenization Strategies

Transformer Architecture

API Reference

Troubleshooting

Build docs developers (and LLMs) love

Overview

Concepts

Training

Games

​Prerequisites

​Installation

​Three-Step Workflow

​Understanding the Results

​What’s in Each File?

​episodes.json

​weights.txt

​Next Steps

Explore the Theory

Tokenization Strategies

Transformer Architecture

API Reference

​Troubleshooting

Build docs developers (and LLMs) love

Prerequisites

Installation

Three-Step Workflow

Understanding the Results

What’s in Each File?

`episodes.json`

`weights.txt`

Next Steps

Troubleshooting