generate.py
Generates tokenized Snake episodes from a mix of AI agents and saves them as JSON.What It Does
The script runs a weighted mix of three agent types to produce diverse gameplay patterns:- RandomAgent (40%) - Random valid moves
- GreedyAgent (40%) - Moves toward food
- WallFollowerAgent (20%) - Follows walls systematically
episodes.json at the project root.
Configuration
| Parameter | Value | Description |
|---|---|---|
n_episodes | 200 | Number of episodes to generate |
snapshot_interval | 16 | Ticks between keyframe snapshots |
agent_mix | 40/40/20 | Random/Greedy/WallFollower ratio |
seed | 42 | Random seed for reproducibility |
grid_size | 10×10 | Snake game board dimensions |
Output Format
The script producesepisodes.json containing a list of token sequences. Each episode is an array of token IDs from the 74-token vocabulary:
Usage
Example Output
Episodes vary in length (45-312 tokens) depending on how long the snake survives. The average episode is ~156 tokens.
train.py
Trains the GameGPT transformer on tokenized episodes using next-token prediction.What It Does
Loadsepisodes.json and trains a causal transformer to predict the next token in gameplay sequences. The model learns game physics, rules, and behavioral patterns from pure sequence prediction.
Uses Adam optimizer with learning rate decay and samples random subsequences from episodes during training.
Model Configuration
| Parameter | Value | Description |
|---|---|---|
vocab_size | 74 | Size of game event vocabulary |
n_layer | 2 | Number of transformer layers |
n_embd | 32 | Embedding dimension |
block_size | 64 | Maximum context window (tokens) |
n_head | 4 | Number of attention heads |
head_dim | 8 | Dimension per head (n_embd / n_head) |
| Total params | ~31K | Including embeddings and all layers |
Training Parameters
lr_t = lr * (1 - step / 5000)
Output
Saves trained model weights toweights.txt as plain text. Each line contains:
Usage
The script requires
episodes.json to exist. Run generate.py first if you don’t have training data.Example Output
- Initial loss ~4.47 (near random baseline of ln(74) ≈ 4.3)
- Final loss ~0.25 after 5000 steps
- Training takes approximately 36 hours on CPU (pure Python implementation)
sample.py
Samples novel gameplay sequences from the trained model and validates them against game rules.What It Does
Loads the trained model fromweights.txt and generates 20 new gameplay sequences starting from BOS (beginning-of-sequence) token. Each sequence is validated across three tiers:
- Structural - Valid BOS→EOS structure
- Physical - Moves are adjacent cells, positions in bounds
- Rules - EAT→GROW+FOOD_SPAWN, DIE→EOS mappings
Sampling Configuration
| Parameter | Value | Description |
|---|---|---|
n_samples | 20 | Number of sequences to generate |
temperature | 0.5 | Sampling temperature (lower = more conservative) |
max_len | 64 | Maximum sequence length |
bos_id | 0 | Beginning-of-sequence token ID |
eos_id | 1 | End-of-sequence token ID |
Validation Output
Each sample is checked against three validation passes:- S = Structural pass
- P = Physical pass
- R = Rule pass
-) indicates failure.
Usage
The script requires
weights.txt to exist. Run train.py first if you don’t have trained weights.Example Output
- 100% rule validity - Model perfectly learned EAT→GROW+FOOD_SPAWN and DIE→EOS
- 95% physical validity - Moves are adjacent cells, positions in bounds
- 45% structural validity - Lower because model often hits 64-token context limit mid-game without generating EOS
Structural validity is low because sequences often exceed the 64-token context window. The model generates valid gameplay but doesn’t always complete the episode within the context limit. This is expected behavior, not a bug.
Quick Start Pipeline
Generate Training Data
Run the agent mix to produce 200 tokenized episodes:Output:
episodes.json (~200 episodes, ~156 tokens each)Train the Model
Train GameGPT transformer for 5000 steps:Output:
weights.txt (~31K parameters)Expected time: ~36 hours on CPUPerformance Notes
All scripts are implemented in pure Python with zero dependencies. Training is CPU-only and takes approximately 36 hours for 5000 steps. This is intentional - the project demonstrates that transformers can learn game grammar without frameworks or GPUs. For production use, consider:- PyTorch/JAX reimplementation for GPU acceleration
- Larger models (more layers, bigger embeddings)
- More training episodes and longer training runs
- Batched training instead of single-episode updates
