Architecture Overview

Overall Structure

The RL Cycle Demo is built as a single-page application using vanilla JavaScript, HTML5, and CSS3. There are no external dependencies or frameworks—everything runs in the browser. The architecture consists of:

HTML structure: Two parallel grid panels with controls
CSS styling: Custom design system with CSS variables and animations
JavaScript logic: State management, policy execution, environment simulation, and rendering

State Management

The entire application state is managed through a single state object that tracks both panels independently:

const state = {
  1: { pos: [...START], steps: 0, reward: 0, repeats: 0, history: [], running: false, done: false, interval: null },
  2: { pos: [...START], steps: 0, reward: 0, escapes: 0, history: [], running: false, done: false, interval: null }
};

Each panel (1 and 2) maintains:

pos: Current agent position [row, col]
steps: Number of steps taken
reward: Cumulative reward
repeats: (Panel 1) Count of blocked movements
escapes: (Panel 2) Count of cycle detection escapes
history: Array of visited positions
running: Boolean flag for animation state
done: Episode completion flag
interval: Timer reference for continuous execution

Key Constants

The demo uses several constants defined at the top of the script:

const SIZE = 3;
const WALL = [1, 1];
const GOAL = [2, 2];
const START = [0, 0];
const ACTIONS = { 0: '↑ arriba', 1: '→ derecha', 2: '↓ abajo', 3: '← izquierda' };
const ACTION_ARROWS = { 0: '↑', 1: '→', 2: '↓', 3: '←' };
const MAX_STEPS = 30;
const CYCLE_THRESHOLD = 2;

SIZE: Grid dimensions (3×3)
WALL: Immovable wall position
GOAL: Target position
START: Agent starting position
ACTIONS: Action-to-text mapping
ACTION_ARROWS: Action-to-arrow symbols
MAX_STEPS: Maximum episode length
CYCLE_THRESHOLD: Number of visits to trigger cycle detection

Main Functions

The codebase is organized into focused functions:

Core Logic Functions

badPolicy(pos) - Returns the action for a given position (contains the deliberate bug)
envStep(pos, action) - Simulates environment dynamics and returns {pos, reward, done}
doStep(panelId) - Executes one step of the agent-environment loop

Rendering Functions

renderGrid(panelId) - Renders the 3×3 grid with agent, walls, goal, and trails
renderStats(panelId) - Updates the stats display (steps, reward, repeats/escapes)
addLog(panelId, html) - Appends a log entry to the panel’s log
setStatus(panelId, type, text) - Updates the status badge

Control Functions

runDemo(panelId) - Starts continuous execution with timing loop
stopDemo(panelId) - Pauses execution
stepDemo(panelId) - Executes a single step (button handler)
resetDemo(panelId) - Resets panel to initial state

Dual-Panel Design Pattern

The demo uses a parallel comparison pattern where both panels run the same environment and policy, but with different cycle handling strategies:

Panel 1 (Red): No cycle detection—demonstrates the infinite loop problem
Panel 2 (Green): Cycle detection enabled—shows the solution

Both panels:

Share the same grid environment (SIZE, WALL, GOAL, START)
Use the same badPolicy function
Call the same envStep function
Differ only in the cycle detection logic within doStep

This pattern allows users to visually compare the behavior side-by-side in real-time.

Code Organization

The JavaScript code follows this structure:

Constants (lines 642-650): Environment configuration
State initialization (lines 652-655): Global state object
Policy definition (lines 657-666): badPolicy function
Environment step (lines 668-685): envStep function
Rendering (lines 687-757): Grid, stats, log, status rendering
Step logic (lines 760-811): doStep with cycle detection
Control handlers (lines 813-867): Run, stop, reset functions
Event listeners (lines 869-875): Speed slider handlers
Initialization (lines 877-881): Initial render calls

You can navigate through the code following the function call chain:

runDemo → tick → doStep → envStep + renderGrid + renderStats

The architecture is intentionally simple to make it easy to understand, modify, and extend. Adding new features (like different policies or larger grids) requires minimal changes.

Overview

Concepts

Demo Guide

Solutions

Implementation

Context

Overall Structure

State Management

Key Constants

Main Functions

Core Logic Functions

Rendering Functions

Control Functions

Dual-Panel Design Pattern

Code Organization

Build docs developers (and LLMs) love

Overview

Concepts

Demo Guide

Solutions

Implementation

Context

​Overall Structure

​State Management

​Key Constants

​Main Functions

​Core Logic Functions

​Rendering Functions

​Control Functions

​Dual-Panel Design Pattern

​Code Organization

Build docs developers (and LLMs) love

Overall Structure

State Management

Key Constants

Main Functions

Core Logic Functions

Rendering Functions

Control Functions

Dual-Panel Design Pattern

Code Organization