Skip to main content
The RL Cycle Demo shows you how a reinforcement learning agent can get trapped in an infinite loop due to a flawed policy, and how cycle detection can rescue it.

Access the live demo

Open the interactive visualization in your browser: https://jhonzacipa.github.io/rl-cycle-demo/ No installation or dependencies required. The demo runs entirely in your browser.

Understanding the environment

When you open the demo, you’ll see a 3×3 grid world with:

Agent

Starts at position (0,0) - top-left corner

Goal

Located at position (2,2) - bottom-right corner

Wall

Blocks position (1,1) - center of the grid
The agent follows a bad policy that contains a critical bug: at position (1,2) — just one step above the goal — it moves left instead of down, causing it to bounce against the wall forever.
// index.html:658-666
function badPolicy(pos) {
  const key = `${pos[0]},${pos[1]}`;
  const policy = {
    '0,0': 1, '0,1': 1, '0,2': 2,
    '1,2': 3, // ← BUG: should be 2 (down)
    '1,0': 0,
  };
  return policy[key] ?? 1;
}

Running the simulations

The demo presents two scenarios side-by-side:
1

No protection (left panel)

Watch the agent follow the deterministic bad policy with no safeguards. It gets stuck in an infinite loop, bouncing against the wall at position (1,1).Click ▶ Ejecutar (Run) to start the simulation automatically.
2

Cycle detection (right panel)

See the same bad policy enhanced with visit counting and forced exploration. When the agent visits a state more than the threshold, it takes a random alternative action to escape the cycle.
// index.html:768-781
const visits = s.history.filter(h => h[0] === s.pos[0] && h[1] === s.pos[1]).length;
if (visits >= CYCLE_THRESHOLD) {
  const original = badPolicy(s.pos);
  const options = [0, 1, 2, 3].filter(a => a !== original);
  action = options[Math.floor(Math.random() * options.length)];
  escaped = true;
  s.escapes++;
  addLog(panelId, `<span class="cycle">⚠️ Ciclo detected</span>`);
}
3

Compare the results

The left panel shows the agent trapped in an infinite loop, while the right panel shows successful escape and goal achievement. Watch the stats counters and action logs to understand the difference.

Using the controls

Each panel includes interactive controls:
ControlDescription
▶ EjecutarExecute the simulation automatically at the selected speed
→ PasoAdvance one step at a time for detailed observation
↺ ResetRestart the simulation from the initial state
Speed SliderAdjust animation speed between 50ms and 800ms per step
The speed slider updates in real-time. Try slowing down to 800ms to carefully observe each decision the agent makes.

Monitoring the agent

Watch these key metrics in each panel:
  • Pasos (Steps): Total number of actions taken
  • Recompensa (Reward): Cumulative reward (10 for goal, -0.1 per step)
  • Repeticiones (left panel): Number of blocked movements
  • Escapes (right panel): Number of times cycle detection triggered
The simulation has a maximum of 30 steps defined by MAX_STEPS (index.html:649). If the agent hasn’t reached the goal by then, it stops automatically.

Running locally

You can run the demo on your own machine with zero dependencies:
1

Clone the repository

git clone https://github.com/JhonZacipa/rl-cycle-demo.git
cd rl-cycle-demo
2

Open in browser

open index.html
Or simply double-click index.html in your file explorer. The demo runs entirely with client-side JavaScript — no build step or server required.
The demo uses vanilla JavaScript with no external dependencies, making it perfect for educational purposes and offline exploration.

Next steps

Core Concepts

Learn about reinforcement learning policies and infinite loops

Cycle Detection

Understand how cycle detection rescues trapped agents

Build docs developers (and LLMs) love