Skip to main content
Training data comes from gameplay episodes generated by three agent types, each with distinct behavioral patterns. Episodes are played, recorded as event streams, and tokenized into sequences for model training.

Agent Types

Three agent implementations provide diverse gameplay patterns:

RandomAgent

Selects random legal actions with uniform probability.
game_grammar/agents.py
class RandomAgent:
    def __init__(self, seed=None):
        self.rng = _random.Random(seed)

    def act(self, state: SnakeState, legal: list[Action]) -> Action:
        return self.rng.choice(legal)
Mix ratio: 40%

GreedyAgent

Minimizes Manhattan distance to food, choosing moves that bring the snake closer to the target.
game_grammar/agents.py
class GreedyAgent:
    """Minimize Manhattan distance to food."""

    def __init__(self, seed=None):
        self.rng = _random.Random(seed)

    def act(self, state: SnakeState, legal: list[Action]) -> Action:
        fx, fy = state.food
        best_dist = float("inf")
        best_actions: list[Action] = []
        for a in legal:
            dx, dy = DIR_DELTA[a]
            nx, ny = state.head[0] + dx, state.head[1] + dy
            dist = abs(nx - fx) + abs(ny - fy)
            if dist < best_dist:
                best_dist = dist
                best_actions = [a]
            elif dist == best_dist:
                best_actions.append(a)
        return self.rng.choice(best_actions)
Mix ratio: 40%

WallFollowerAgent

Prefers moves that keep the snake adjacent to a wall, creating predictable boundary-following patterns.
game_grammar/agents.py
class WallFollowerAgent:
    """Prefer moves that keep a wall adjacent."""

    def __init__(self, width=10, height=10, seed=None):
        self.width = width
        self.height = height
        self.rng = _random.Random(seed)

    def _near_wall(self, x: int, y: int) -> bool:
        return x <= 0 or x >= self.width - 1 or y <= 0 or y >= self.height - 1

    def act(self, state: SnakeState, legal: list[Action]) -> Action:
        wall_moves: list[Action] = []
        safe_moves: list[Action] = []
        for a in legal:
            dx, dy = DIR_DELTA[a]
            nx, ny = state.head[0] + dx, state.head[1] + dy
            # Skip moves that hit walls
            if nx < 0 or nx >= self.width or ny < 0 or ny >= self.height:
                continue
            safe_moves.append(a)
            if self._near_wall(nx, ny):
                wall_moves.append(a)
        choices = wall_moves if wall_moves else (safe_moves if safe_moves else legal)
        return self.rng.choice(choices)
Mix ratio: 20%
The agent mix is weighted: 40% random, 40% greedy, 20% wall follower. This distribution ensures the model learns from both exploratory and goal-directed behavior patterns.

Episode Collection

play_episode()

Plays a single episode with an agent, recording all events and states indexed by game tick.
game_grammar/data.py
def play_episode(
    game: SnakeGame,
    agent,
    codec: EventCodec | None = None,
    max_ticks: int = 200,
) -> tuple[dict[int, list[Event]], dict[int, SnakeState]]:
    """Play one episode, returning events and states indexed by tick."""
    state = game.reset()
    events_by_tick: dict[int, list[Event]] = {}
    states_by_tick: dict[int, SnakeState] = {0: state}

    for _ in range(max_ticks):
        if not state.alive:
            break
        legal = game.legal_actions(state)
        action = agent.act(state, legal)
        state, events, done = game.step(action)
        events_by_tick[state.tick] = events
        states_by_tick[state.tick] = state
        if done:
            break

    return events_by_tick, states_by_tick
Returns:
  • events_by_tick: Dictionary mapping tick number to list of events that occurred
  • states_by_tick: Dictionary mapping tick number to full game state

collect_episodes()

Collects multiple tokenized episodes from a weighted agent mix.
game_grammar/data.py
def collect_episodes(
    n: int,
    agent_mix: list[tuple[object, float]],
    width: int = 10,
    height: int = 10,
    codec: EventCodec | None = None,
    max_ticks: int = 200,
    seed: int = 42,
) -> list[list[int]]:
    """Collect n tokenized episodes from a weighted agent mix.

    agent_mix: list of (agent, weight) pairs.
    Returns list of token sequences.
    """
    if codec is None:
        codec = EventCodec()

    rng = _random.Random(seed)
    agents = [a for a, _ in agent_mix]
    weights = [w for _, w in agent_mix]
    episodes: list[list[int]] = []

    for i in range(n):
        agent = rng.choices(agents, weights=weights)[0]
        game = SnakeGame(width=width, height=height, seed=rng.randint(0, 2**31))
        events_by_tick, states_by_tick = play_episode(game, agent, max_ticks=max_ticks)
        tokens = codec.encode_episode(events_by_tick, states_by_tick)
        episodes.append(tokens)

    return episodes
n
int
Number of episodes to collect
agent_mix
list[tuple[object, float]]
List of (agent, weight) pairs for weighted random selection
codec
EventCodec
Tokenizer instance (defaults to EventCodec with snapshot_interval=16)
max_ticks
int
default:"200"
Maximum ticks per episode before forced termination

Generation Script

The scripts/generate.py script orchestrates the full data collection pipeline:
scripts/generate.py
from game_grammar.agents import RandomAgent, GreedyAgent, WallFollowerAgent
from game_grammar.codec import EventCodec
from game_grammar.data import collect_episodes

n_episodes = 200

agent_mix = [
    (RandomAgent(seed=1), 0.4),
    (GreedyAgent(seed=2), 0.4),
    (WallFollowerAgent(10, 10, seed=3), 0.2),
]

codec = EventCodec(snapshot_interval=16)
episodes = collect_episodes(
    n=n_episodes,
    agent_mix=agent_mix,
    codec=codec,
    seed=42,
)
Output: 200 tokenized episodes saved to episodes.json

Episode Structure

Each episode is a sequence of integer token IDs representing:
  • BOS token (beginning of sequence)
  • SNAP tokens (periodic state snapshots)
  • TICK + event tokens (game state transitions)
  • EOS token (end of sequence)
BOS SNAP PLAYER X5 Y5 DIR_R LEN1 FOOD X8 Y6 SCORE V0
  TICK INPUT_L MOVE X4 Y5
  TICK INPUT_D MOVE X4 Y6
  TICK INPUT_D MOVE X4 Y7
  TICK INPUT_R MOVE X5 Y7
  ...
EOS
Typical episode lengths range from minimum 20 tokens to maximum 500+ tokens, with an average around 150 tokens per episode.

Build docs developers (and LLMs) love