Overall Structure
The RL Cycle Demo is built as a single-page application using vanilla JavaScript, HTML5, and CSS3. There are no external dependencies or frameworks—everything runs in the browser. The architecture consists of:- HTML structure: Two parallel grid panels with controls
- CSS styling: Custom design system with CSS variables and animations
- JavaScript logic: State management, policy execution, environment simulation, and rendering
State Management
The entire application state is managed through a singlestate object that tracks both panels independently:
pos: Current agent position[row, col]steps: Number of steps takenreward: Cumulative rewardrepeats: (Panel 1) Count of blocked movementsescapes: (Panel 2) Count of cycle detection escapeshistory: Array of visited positionsrunning: Boolean flag for animation statedone: Episode completion flaginterval: Timer reference for continuous execution
Key Constants
The demo uses several constants defined at the top of the script:- SIZE: Grid dimensions (3×3)
- WALL: Immovable wall position
- GOAL: Target position
- START: Agent starting position
- ACTIONS: Action-to-text mapping
- ACTION_ARROWS: Action-to-arrow symbols
- MAX_STEPS: Maximum episode length
- CYCLE_THRESHOLD: Number of visits to trigger cycle detection
Main Functions
The codebase is organized into focused functions:Core Logic Functions
badPolicy(pos)- Returns the action for a given position (contains the deliberate bug)envStep(pos, action)- Simulates environment dynamics and returns{pos, reward, done}doStep(panelId)- Executes one step of the agent-environment loop
Rendering Functions
renderGrid(panelId)- Renders the 3×3 grid with agent, walls, goal, and trailsrenderStats(panelId)- Updates the stats display (steps, reward, repeats/escapes)addLog(panelId, html)- Appends a log entry to the panel’s logsetStatus(panelId, type, text)- Updates the status badge
Control Functions
runDemo(panelId)- Starts continuous execution with timing loopstopDemo(panelId)- Pauses executionstepDemo(panelId)- Executes a single step (button handler)resetDemo(panelId)- Resets panel to initial state
Dual-Panel Design Pattern
The demo uses a parallel comparison pattern where both panels run the same environment and policy, but with different cycle handling strategies:- Panel 1 (Red): No cycle detection—demonstrates the infinite loop problem
- Panel 2 (Green): Cycle detection enabled—shows the solution
- Share the same grid environment (
SIZE,WALL,GOAL,START) - Use the same
badPolicyfunction - Call the same
envStepfunction - Differ only in the cycle detection logic within
doStep
Code Organization
The JavaScript code follows this structure:- Constants (lines 642-650): Environment configuration
- State initialization (lines 652-655): Global state object
- Policy definition (lines 657-666):
badPolicyfunction - Environment step (lines 668-685):
envStepfunction - Rendering (lines 687-757): Grid, stats, log, status rendering
- Step logic (lines 760-811):
doStepwith cycle detection - Control handlers (lines 813-867): Run, stop, reset functions
- Event listeners (lines 869-875): Speed slider handlers
- Initialization (lines 877-881): Initial render calls
The architecture is intentionally simple to make it easy to understand, modify, and extend. Adding new features (like different policies or larger grids) requires minimal changes.