Quickstart

The RL Cycle Demo shows you how a reinforcement learning agent can get trapped in an infinite loop due to a flawed policy, and how cycle detection can rescue it.

Access the live demo

Open the interactive visualization in your browser: https://jhonzacipa.github.io/rl-cycle-demo/ No installation or dependencies required. The demo runs entirely in your browser.

Understanding the environment

When you open the demo, you’ll see a 3×3 grid world with:

Agent

Starts at position (0,0) - top-left corner

Goal

Located at position (2,2) - bottom-right corner

Wall

Blocks position (1,1) - center of the grid

The agent follows a bad policy that contains a critical bug: at position (1,2) — just one step above the goal — it moves left instead of down, causing it to bounce against the wall forever.

// index.html:658-666
function badPolicy(pos) {
  const key = `${pos[0]},${pos[1]}`;
  const policy = {
    '0,0': 1, '0,1': 1, '0,2': 2,
    '1,2': 3, // ← BUG: should be 2 (down)
    '1,0': 0,
  };
  return policy[key] ?? 1;
}

Running the simulations

The demo presents two scenarios side-by-side:

No protection (left panel)

Watch the agent follow the deterministic bad policy with no safeguards. It gets stuck in an infinite loop, bouncing against the wall at position (1,1).Click ▶ Ejecutar (Run) to start the simulation automatically.

Cycle detection (right panel)

See the same bad policy enhanced with visit counting and forced exploration. When the agent visits a state more than the threshold, it takes a random alternative action to escape the cycle.

// index.html:768-781
const visits = s.history.filter(h => h[0] === s.pos[0] && h[1] === s.pos[1]).length;
if (visits >= CYCLE_THRESHOLD) {
  const original = badPolicy(s.pos);
  const options = [0, 1, 2, 3].filter(a => a !== original);
  action = options[Math.floor(Math.random() * options.length)];
  escaped = true;
  s.escapes++;
  addLog(panelId, `<span class="cycle">⚠️ Ciclo detected</span>`);
}

Compare the results

The left panel shows the agent trapped in an infinite loop, while the right panel shows successful escape and goal achievement. Watch the stats counters and action logs to understand the difference.

Using the controls

Each panel includes interactive controls:

Control	Description
▶ Ejecutar	Execute the simulation automatically at the selected speed
→ Paso	Advance one step at a time for detailed observation
↺ Reset	Restart the simulation from the initial state
Speed Slider	Adjust animation speed between 50ms and 800ms per step

The speed slider updates in real-time. Try slowing down to 800ms to carefully observe each decision the agent makes.

Monitoring the agent

Watch these key metrics in each panel:

Pasos (Steps): Total number of actions taken
Recompensa (Reward): Cumulative reward (10 for goal, -0.1 per step)
Repeticiones (left panel): Number of blocked movements
Escapes (right panel): Number of times cycle detection triggered

The simulation has a maximum of 30 steps defined by MAX_STEPS (index.html:649). If the agent hasn’t reached the goal by then, it stops automatically.

Running locally

You can run the demo on your own machine with zero dependencies:

Clone the repository

git clone https://github.com/JhonZacipa/rl-cycle-demo.git
cd rl-cycle-demo

Open in browser

open index.html

Or simply double-click index.html in your file explorer. The demo runs entirely with client-side JavaScript — no build step or server required.

The demo uses vanilla JavaScript with no external dependencies, making it perfect for educational purposes and offline exploration.

Overview

Concepts

Demo Guide

Solutions

Implementation

Context

Access the live demo

Understanding the environment

Agent

Goal

Wall

Running the simulations

Using the controls

Monitoring the agent

Running locally

Next steps

Core Concepts

Cycle Detection

Build docs developers (and LLMs) love

Overview

Concepts

Demo Guide

Solutions

Implementation

Context

​Access the live demo

​Understanding the environment

Agent

Goal

Wall

​Running the simulations

​Using the controls

​Monitoring the agent

​Running locally

​Next steps

Core Concepts

Cycle Detection

Build docs developers (and LLMs) love

Access the live demo

Understanding the environment

Running the simulations

Using the controls

Monitoring the agent

Running locally

Next steps