Cycle Detection

What is Cycle Detection?

Cycle detection is an active safeguard that monitors which states your agent visits. When it detects the agent has returned to the same state too many times, it intervenes by forcing a random exploration action instead of following the policy. Unlike max steps (which just stops execution), cycle detection actively breaks loops and helps the agent escape stuck states.

Cycle detection is the technique that enables the right panel of the demo to succeed while the left panel gets stuck. It’s the difference between “give up after N steps” and “actively escape the loop.”

How It Works

The algorithm follows four steps:

1. Track State History

Maintain an array of all states the agent has visited:

const history = [];

// After each step
history.push([currentRow, currentCol]);

2. Count Visits to Current State

Before taking an action, count how many times you’ve been here:

const visits = history.filter(h => 
  h[0] === currentState[0] && h[1] === currentState[1]
).length;

3. Compare Against Threshold

If visits exceed the threshold, a cycle is detected:

const CYCLE_THRESHOLD = 2;

if (visits >= CYCLE_THRESHOLD) {
  // Cycle detected! Take action.
}

4. Force Exploration

Instead of following the policy, choose a random alternative action:

const policyAction = policy(currentState);
const alternatives = [0, 1, 2, 3].filter(a => a !== policyAction);
const explorationAction = alternatives[Math.floor(Math.random() * alternatives.length)];

By excluding the policy’s action from the random choice, you ensure the agent tries something different from what got it stuck.

Implementation in the Demo

The RL Cycle Demo implements cycle detection in the right panel (green). Let’s examine the actual source code.

Constants and State Tracking

// From index.html:650
const CYCLE_THRESHOLD = 2;

// From index.html:654
const state = {
  2: { 
    pos: [...START], 
    history: [],  // Tracks all visited positions
    escapes: 0,   // Counts forced explorations
    // ...
  }
};

The Core Detection Logic

Here’s the complete cycle detection implementation from the doStep function:

// From index.html:767-779
if (panelId === 2) {
  // With cycle detection
  const visits = s.history.filter(h => 
    h[0] === s.pos[0] && h[1] === s.pos[1]
  ).length;
  
  if (visits >= CYCLE_THRESHOLD) {
    const original = badPolicy(s.pos);
    const options = [0, 1, 2, 3].filter(a => a !== original);
    action = options[Math.floor(Math.random() * options.length)];
    escaped = true;
    s.escapes++;
    addLog(panelId, 
      `<span class="cycle">⚠️ Ciclo en (${s.pos}) visitado ${visits}x → exploración forzada</span>`
    );
  } else {
    action = badPolicy(s.pos);
  }
}

History Recording

Before applying any action, the current position is saved to history:

// From index.html:784
s.history.push([...s.pos]);

Always record the state before taking the action. If you record after, you won’t detect cycles correctly because the visit count will be off by one.

Why CYCLE_THRESHOLD = 2?

The demo uses a threshold of 2 visits to trigger cycle detection. This means:

First visit: Normal — agent is exploring
Second visit: Warning — might be a cycle
Third visit: Intervention — force exploration

This conservative threshold catches cycles quickly while allowing the agent to revisit states occasionally (which can be valid in some navigation scenarios).

// Example state history showing cycle detection
history = [
  [0,0],  // Start
  [0,1],  // Move right
  [1,1],  // Move down - WALL, bounce back to [0,1]
  [0,1],  // First revisit to [0,1]
  [1,1],  // Try down again - WALL, bounce back
  [0,1],  // Second revisit to [0,1] - visits >= 2, CYCLE DETECTED!
];

Lower thresholds (1-2) catch cycles faster but may trigger false positives. Higher thresholds (3-5) are more conservative but let the agent waste more steps before intervening.

The Demo’s Cycle Scenario

Let’s trace exactly what happens in the demo:

Without Cycle Detection (Left Panel)

Agent reaches position (1,2) — just above the goal
Bad policy says: move left (action 3)
Agent moves to (1,1) — hits wall, bounces back to (1,2)
Repeat step 2 forever… stuck in infinite loop
Eventually hits MAX_STEPS and terminates

With Cycle Detection (Right Panel)

Agent reaches position (1,2) first time
Bad policy says: move left → bounces back to (1,2)
Agent reaches (1,2) second time — visits = 2, threshold reached!
Cycle detected! Exclude action 3 (left), choose randomly from actions 0 (up), 1 (right), or 2 (down)
Agent picks action 2 (down) randomly
Moves to (2,2) — goal reached! 🎉

The forced exploration at step 4 gives the agent a chance to “accidentally” discover the correct action (down) that the policy failed to learn.

Memory Overhead

Cycle detection requires storing the complete state history, which grows with episode length:

// Memory usage scales with steps
memoryUsed = stateSize × numberOfSteps

// For demo's 3×3 grid:
// State = [row, col] = 2 numbers = ~16 bytes
// At MAX_STEPS = 30: ~480 bytes per episode

// For complex environments:
// State = 84×84 image = ~7KB
// At 1000 steps: ~7MB per episode

For environments with large state spaces (images, high-dimensional observations), consider using state hashing or position-only tracking instead of storing complete states.

Pros and Cons

✅ Advantages

Active escape: Breaks loops instead of just terminating
Targeted intervention: Only acts when cycles detected
Discovery mechanism: Random exploration can find solutions
Configurable sensitivity: Adjust threshold for your domain
Maintains determinism: Agent follows policy until stuck

❌ Disadvantages

Memory overhead: Must store complete state history
State comparison cost: Filtering history on each step
Discrete states only: Hard to detect cycles in continuous spaces
May not escape: Random action might not find solution
Threshold tuning: Requires domain knowledge

When to Use Cycle Detection

✅ Good Fit

Discrete state spaces: Grids, graphs, finite state machines
Short episodes: History doesn’t grow too large
Deterministic policies: Cycles are repeatable
Deployment: Need agents to escape bugs in production

❌ Not Ideal

Continuous state spaces: Hard to detect exact revisits
Long episodes: Memory overhead becomes prohibitive
Stochastic environments: Random outcomes may mask cycles
Training: ε-greedy is simpler for exploration

Enhanced Implementation

Here’s a production-ready cycle detection class:

class CycleDetector {
  constructor(threshold = 2, stateComparator = null) {
    this.threshold = threshold;
    this.history = [];
    this.escapes = 0;
    this.compare = stateComparator || this.defaultCompare;
  }

  defaultCompare(state1, state2) {
    return JSON.stringify(state1) === JSON.stringify(state2);
  }

  recordState(state) {
    this.history.push(state);
  }

  detectCycle(currentState) {
    const visits = this.history.filter(s => 
      this.compare(s, currentState)
    ).length;

    return visits >= this.threshold;
  }

  forceExploration(policyAction, allActions) {
    const alternatives = allActions.filter(a => a !== policyAction);
    
    if (alternatives.length === 0) {
      return policyAction; // No alternatives available
    }

    this.escapes++;
    const randomIndex = Math.floor(Math.random() * alternatives.length);
    return alternatives[randomIndex];
  }

  reset() {
    this.history = [];
    this.escapes = 0;
  }

  getStats() {
    return {
      totalSteps: this.history.length,
      escapes: this.escapes,
      uniqueStates: new Set(this.history.map(s => JSON.stringify(s))).size
    };
  }
}

// Usage
const detector = new CycleDetector(2);

while (!done) {
  detector.recordState(currentState);
  
  let action;
  if (detector.detectCycle(currentState)) {
    console.log('⚠️ Cycle detected, forcing exploration');
    action = detector.forceExploration(
      policy.getAction(currentState),
      [0, 1, 2, 3]
    );
  } else {
    action = policy.getAction(currentState);
  }
  
  currentState = env.step(action);
}

console.log(detector.getStats());

Combining with Other Techniques

Cycle detection works best when combined with max steps for guaranteed termination. Even if cycle detection fails to find an escape, max steps ensures eventual termination.

const MAX_STEPS = 100;
const detector = new CycleDetector(2);
let steps = 0;

while (!done && steps < MAX_STEPS) {
  detector.recordState(state);
  
  // Primary safeguard: cycle detection
  if (detector.detectCycle(state)) {
    action = detector.forceExploration(policy(state), actions);
  } else {
    action = policy(state);
  }
  
  state = env.step(action);
  steps++;
}

// Secondary safeguard: max steps
if (steps >= MAX_STEPS) {
  console.warn('Max steps reached despite cycle detection');
}

Interactive Demo

See Cycle Detection in Action

Watch the right panel (green) detect and escape the cycle at position (1,2) while the left panel stays stuck

Pay attention to the Escapes counter in the stats — this shows how many times cycle detection intervened.

Key Takeaways

Cycle detection actively breaks loops by forcing exploration
Requires tracking state history to count visits
Threshold of 2-3 visits typically works well
Best for discrete state spaces with deterministic policies
Always combine with max steps as backup protection
Memory overhead grows with episode length

View the complete implementation at github.com/JhonZacipa/rl-cycle-demo — see index.html:767-779 for the cycle detection logic.

Overview

Concepts

Demo Guide

Solutions

Implementation

Context

What is Cycle Detection?

How It Works

1. Track State History

2. Count Visits to Current State

3. Compare Against Threshold

4. Force Exploration

Implementation in the Demo

Constants and State Tracking

The Core Detection Logic

History Recording

Why CYCLE_THRESHOLD = 2?

The Demo’s Cycle Scenario

Without Cycle Detection (Left Panel)

With Cycle Detection (Right Panel)

Memory Overhead

Pros and Cons

When to Use Cycle Detection

✅ Good Fit

❌ Not Ideal

Enhanced Implementation

Combining with Other Techniques

Interactive Demo

See Cycle Detection in Action

Key Takeaways

Build docs developers (and LLMs) love

Overview

Concepts

Demo Guide

Solutions

Implementation

Context

​What is Cycle Detection?

​How It Works

​1. Track State History

​2. Count Visits to Current State

​3. Compare Against Threshold

​4. Force Exploration

​Implementation in the Demo

​Constants and State Tracking

​The Core Detection Logic

​History Recording

​Why CYCLE_THRESHOLD = 2?

​The Demo’s Cycle Scenario

​Without Cycle Detection (Left Panel)

​With Cycle Detection (Right Panel)

​Memory Overhead

​Pros and Cons

​When to Use Cycle Detection

​✅ Good Fit

​❌ Not Ideal

​Enhanced Implementation

​Combining with Other Techniques

​Interactive Demo

See Cycle Detection in Action

​Key Takeaways

Build docs developers (and LLMs) love

What is Cycle Detection?

How It Works

1. Track State History

2. Count Visits to Current State

3. Compare Against Threshold

4. Force Exploration

Implementation in the Demo

Constants and State Tracking

The Core Detection Logic

History Recording

Why CYCLE_THRESHOLD = 2?

The Demo’s Cycle Scenario

Without Cycle Detection (Left Panel)

With Cycle Detection (Right Panel)

Memory Overhead

Pros and Cons

When to Use Cycle Detection

✅ Good Fit

❌ Not Ideal

Enhanced Implementation

Combining with Other Techniques

Interactive Demo

Key Takeaways