Skip to main content
The demo compares two scenarios side-by-side to illustrate the impact of cycle detection on RL agent behavior.

Panel 1: No Protection

Panel 1 demonstrates what happens when an RL agent follows a flawed policy without any safety mechanisms.

Behavior Pattern

function badPolicy(pos) {
  const key = `${pos[0]},${pos[1]}`;
  const policy = {
    '0,0': 1, '0,1': 1, '0,2': 2,
    '1,2': 3, // ← BUG: should be 2 (down)
    '1,0': 0,
  };
  return policy[key] ?? 1;
}
The policy contains a critical bug at position (1,2): it instructs the agent to go left (action 3), but the wall at (1,1) blocks this movement. The agent becomes trapped:
  • Agent reaches (1,2) after 4 successful steps
  • Attempts to move left but collides with the wall
  • Stays at (1,2) indefinitely
  • Repeats the same failed action until MAX_STEPS is reached

Stats Tracked

state[1] = {
  pos: [...START],
  steps: 0,        // Total actions taken
  reward: 0,       // Cumulative reward (-0.1 per step, +10 for goal)
  repeats: 0,      // Number of times action didn't change position
  history: [],     // Array of visited positions
  running: false,
  done: false,
  interval: null
};
Counts every action the agent takes, regardless of whether it successfully moves. This counter stops at MAX_STEPS = 30.

Log Output Format

The log displays each step with structured information:
addLog(panelId, `<span class="step-num">[${s.steps}]</span> <span class="${logClass}">${ACTION_ARROWS[action]} → (${s.pos})</span>${stuckNote}`);
Example log entries:
[1] → → (0,1)
[2] → → (0,2)
[3] ↓ → (1,2)
[4] ← → (1,2) (bloqueado)
[5] ← → (1,2) (bloqueado)
...
[30] ← → (1,2) (bloqueado)
🔴 Límite de 30 pasos alcanzado
Each entry shows:
  • [N]: Step number in gray
  • Arrow: Action taken (↑↓←→) in blue
  • (r,c): Resulting position
  • (bloqueado): Warning when position didn’t change

Panel 2: With Cycle Detection

Panel 2 uses the same flawed policy but adds cycle detection to escape infinite loops.

Detection Mechanism

const CYCLE_THRESHOLD = 2;

if (panelId === 2) {
  const visits = s.history.filter(h => h[0] === s.pos[0] && h[1] === s.pos[1]).length;
  if (visits >= CYCLE_THRESHOLD) {
    const original = badPolicy(s.pos);
    const options = [0, 1, 2, 3].filter(a => a !== original);
    action = options[Math.floor(Math.random() * options.length)];
    escaped = true;
    s.escapes++;
    addLog(panelId, `<span class="cycle">⚠️ Ciclo en (${s.pos}) visitado ${visits}x → exploración forzada</span>`);
  }
}
1

Track Visits

The system counts how many times the agent has visited its current state by checking the history array.
2

Detect Cycle

When a state is visited CYCLE_THRESHOLD (2) or more times, a cycle is detected.
3

Force Exploration

Instead of following the bad policy, the agent randomly selects a different action from the remaining options.
4

Track Escapes

The escapes counter increments each time cycle detection overrides the policy.

Stats Tracked

state[2] = {
  pos: [...START],
  steps: 0,
  reward: 0,
  escapes: 0,      // Number of times cycle detection activated
  history: [],
  running: false,
  done: false,
  interval: null
};
Panel 2 tracks Escapes instead of Repeats. This metric shows how many times the cycle detection mechanism saved the agent from repeating a failed action.

Successful Navigation

With cycle detection enabled, Panel 2 typically succeeds:
[1] → → (0,1)
[2] → → (0,2)
[3] ↓ → (1,2)
[4] ← → (1,2) (bloqueado)
[5] ⚠️ Ciclo en (1,2) visitado 2x → exploración forzada
[5] ↓ → (2,2)
🎉 ¡META ALCANZADA en 5 pasos!
The agent:
  • Detects the cycle at (1,2) on the second visit
  • Randomly chooses a new action (down instead of left)
  • Successfully moves to (2,2) and reaches the goal
  • Completes in ~5-6 steps instead of timing out at 30

Status Badges

Both panels display a status badge that updates throughout execution:
function setStatus(panelId, type, text) {
  const badge = document.getElementById(`status${panelId}`);
  badge.className = `status-badge ${type}`;
  badge.innerHTML = `<div class="status-dot"></div><span>${text}</span>`;
}

Status Types

.status-badge.idle {
  background: rgba(136, 136, 160, 0.1);
  color: var(--text-dim);
}
Displayed when the demo is waiting to start or has been paused. Text: “Esperando” or “Pausado”.

Maximum Steps Limit

Both scenarios enforce a step limit to prevent true infinite loops:
const MAX_STEPS = 30;

function doStep(panelId) {
  const s = state[panelId];
  if (s.done || s.steps >= MAX_STEPS) return false;
  
  // ... execute step ...
  
  if (s.steps >= MAX_STEPS) {
    addLog(panelId, `<span class="warning">🔴 Límite de ${MAX_STEPS} pasos alcanzado</span>`);
    setStatus(panelId, 'stuck', 'Ciclo infinito');
  }
}
The MAX_STEPS constant is set to 30. This limit ensures the demo terminates even in Panel 1 where the agent would otherwise loop forever. It’s one of the most fundamental protections in RL training.

Key Differences Summary

AspectPanel 1: No ProtectionPanel 2: With Cycle Detection
OutcomeGets stuck at (1,2)Reaches goal at (2,2)
StepsAlways 30 (max limit)Typically 5-6
Tracked MetricRepeats (~26)Escapes (1-2)
Final Status”Ciclo infinito""¡Meta!”
Reward~-3.0~+9.5
LearningDemonstrates the problemDemonstrates the solution

Build docs developers (and LLMs) love