Access the live demo
Open the interactive visualization in your browser: https://jhonzacipa.github.io/rl-cycle-demo/ No installation or dependencies required. The demo runs entirely in your browser.Understanding the environment
When you open the demo, you’ll see a 3×3 grid world with:Agent
Starts at position
(0,0) - top-left cornerGoal
Located at position
(2,2) - bottom-right cornerWall
Blocks position
(1,1) - center of the grid(1,2) — just one step above the goal — it moves left instead of down, causing it to bounce against the wall forever.
Running the simulations
The demo presents two scenarios side-by-side:No protection (left panel)
Watch the agent follow the deterministic bad policy with no safeguards. It gets stuck in an infinite loop, bouncing against the wall at position
(1,1).Click ▶ Ejecutar (Run) to start the simulation automatically.Cycle detection (right panel)
See the same bad policy enhanced with visit counting and forced exploration. When the agent visits a state more than the threshold, it takes a random alternative action to escape the cycle.
Using the controls
Each panel includes interactive controls:| Control | Description |
|---|---|
| ▶ Ejecutar | Execute the simulation automatically at the selected speed |
| → Paso | Advance one step at a time for detailed observation |
| ↺ Reset | Restart the simulation from the initial state |
| Speed Slider | Adjust animation speed between 50ms and 800ms per step |
The speed slider updates in real-time. Try slowing down to 800ms to carefully observe each decision the agent makes.
Monitoring the agent
Watch these key metrics in each panel:- Pasos (Steps): Total number of actions taken
- Recompensa (Reward): Cumulative reward (10 for goal, -0.1 per step)
- Repeticiones (left panel): Number of blocked movements
- Escapes (right panel): Number of times cycle detection triggered
Running locally
You can run the demo on your own machine with zero dependencies:
The demo uses vanilla JavaScript with no external dependencies, making it perfect for educational purposes and offline exploration.
Next steps
Core Concepts
Learn about reinforcement learning policies and infinite loops
Cycle Detection
Understand how cycle detection rescues trapped agents