Skip to main content

What is cycle detection?

Cycle detection is a safeguard technique that tracks how many times an agent has visited each state. When a state has been visited more than a threshold number of times, the system detects a potential cycle and forces the agent to explore alternative actions.
Cycle detection breaks infinite loops by intervening when the agent shows signs of being stuck.

How it works

The cycle detection algorithm follows these steps:
1

Track state history

The agent maintains a history array of all visited states:
const state = {
  pos: [0, 0],
  history: [],
  // ... other fields
};
Each time the agent takes a step, its current position is added to the history:
s.history.push([...s.pos]);
2

Count state visits

Before taking an action, count how many times the current state appears in the history:
const visits = s.history.filter(
  h => h[0] === s.pos[0] && h[1] === s.pos[1]
).length;
3

Compare against threshold

Check if the visit count exceeds the cycle detection threshold:
const CYCLE_THRESHOLD = 2;

if (visits >= CYCLE_THRESHOLD) {
  // Cycle detected!
}
In this demo, the threshold is set to 2, meaning if a state is visited twice, the system assumes a cycle is forming.
4

Force exploration

When a cycle is detected, instead of following the flawed policy, the agent takes a random alternative action:
if (visits >= CYCLE_THRESHOLD) {
  const original = badPolicy(s.pos);
  const options = [0, 1, 2, 3].filter(a => a !== original);
  action = options[Math.floor(Math.random() * options.length)];
  escaped = true;
  s.escapes++;
}
This forces the agent to try a different action, potentially breaking the cycle.

The CYCLE_THRESHOLD constant

The cycle detection threshold is configurable:
const CYCLE_THRESHOLD = 2;
Lower threshold (1-2):
  • More aggressive cycle detection
  • Intervenes quickly when repetition occurs
  • May cause false positives (intervening when no real cycle exists)
  • Good for environments where legitimate revisits are rare
Higher threshold (3-5+):
  • More conservative cycle detection
  • Allows the agent to revisit states multiple times before intervening
  • Fewer false positives
  • Agent may waste more steps before cycle is detected
In this demo, a threshold of 2 works well because the optimal path never legitimately revisits the same state. In more complex environments, you might need a higher threshold.

Forced exploration

When a cycle is detected, the demo uses forced random exploration:
const original = badPolicy(s.pos);  // Get the flawed policy's action
const options = [0, 1, 2, 3].filter(a => a !== original);  // All other actions
action = options[Math.floor(Math.random() * options.length)];  // Pick one randomly
This approach:
  • Excludes the action that caused the cycle (the flawed policy’s choice)
  • Randomly selects from the remaining 3 actions
  • Gives the agent a chance to discover a better path
Forced exploration doesn’t guarantee the agent will find the optimal path immediately. It just gives the agent a chance to escape the cycle. The agent might need multiple escapes to reach the goal.

Visualizing cycle detection

In the live demo, watch the right panel (“Con Detección de Ciclos”) to see cycle detection in action:
  1. The agent navigates to position (1,2)
  2. The flawed policy tells it to go left (into the wall)
  3. After visiting (1,2) twice, cycle detection triggers
  4. The log shows: ⚠️ Ciclo en (1,2) visitado 2x → exploración forzada
  5. The agent tries a random alternative action (likely down or right)
  6. If it goes down, it reaches the goal!
The “Escapes” counter tracks how many times cycle detection intervened.

When to use cycle detection

Cycle detection is most useful when: You have a deterministic policy — The same state always produces the same action Policy errors are possible — The policy might have bugs or edge cases You can’t modify the policy — You’re using a pre-trained or frozen policy Computational cost is acceptable — Tracking state history has memory overhead You already have exploration — If using ε-greedy or another exploration strategy, cycle detection may be redundant Legitimate revisits are common — In some environments, the optimal path revisits states (e.g., backtracking)

Comparison with other techniques

TechniqueDetects cycles?Prevents cycles?Overhead
Cycle detection✅ Yes✅ Yes (via forced exploration)Medium (history tracking)
Max steps❌ No✅ Yes (terminates episode)Low (simple counter)
ε-greedy❌ No⚠️ Reduces likelihoodLow (random number)
Step penalty❌ No⚠️ Discourages long pathsNone (built into rewards)
Cycle detection is often combined with max steps as a defense-in-depth strategy. Max steps provides a hard limit, while cycle detection tries to fix the problem before the limit is reached.

Next steps

See it in action

Compare the scenarios side-by-side in the demo

Implementation details

Explore the source code for cycle detection

Build docs developers (and LLMs) love