What is cycle detection?
Cycle detection is a safeguard technique that tracks how many times an agent has visited each state. When a state has been visited more than a threshold number of times, the system detects a potential cycle and forces the agent to explore alternative actions.Cycle detection breaks infinite loops by intervening when the agent shows signs of being stuck.
How it works
The cycle detection algorithm follows these steps:Track state history
The agent maintains a history array of all visited states:Each time the agent takes a step, its current position is added to the history:
Count state visits
Before taking an action, count how many times the current state appears in the history:
Compare against threshold
Check if the visit count exceeds the cycle detection threshold:In this demo, the threshold is set to
2, meaning if a state is visited twice, the system assumes a cycle is forming.The CYCLE_THRESHOLD constant
The cycle detection threshold is configurable:- More aggressive cycle detection
- Intervenes quickly when repetition occurs
- May cause false positives (intervening when no real cycle exists)
- Good for environments where legitimate revisits are rare
- More conservative cycle detection
- Allows the agent to revisit states multiple times before intervening
- Fewer false positives
- Agent may waste more steps before cycle is detected
Forced exploration
When a cycle is detected, the demo uses forced random exploration:- Excludes the action that caused the cycle (the flawed policy’s choice)
- Randomly selects from the remaining 3 actions
- Gives the agent a chance to discover a better path
Visualizing cycle detection
In the live demo, watch the right panel (“Con Detección de Ciclos”) to see cycle detection in action:- The agent navigates to position
(1,2) - The flawed policy tells it to go left (into the wall)
- After visiting
(1,2)twice, cycle detection triggers - The log shows:
⚠️ Ciclo en (1,2) visitado 2x → exploración forzada - The agent tries a random alternative action (likely down or right)
- If it goes down, it reaches the goal!
When to use cycle detection
Cycle detection is most useful when: ✅ You have a deterministic policy — The same state always produces the same action ✅ Policy errors are possible — The policy might have bugs or edge cases ✅ You can’t modify the policy — You’re using a pre-trained or frozen policy ✅ Computational cost is acceptable — Tracking state history has memory overhead ❌ You already have exploration — If using ε-greedy or another exploration strategy, cycle detection may be redundant ❌ Legitimate revisits are common — In some environments, the optimal path revisits states (e.g., backtracking)Comparison with other techniques
| Technique | Detects cycles? | Prevents cycles? | Overhead |
|---|---|---|---|
| Cycle detection | ✅ Yes | ✅ Yes (via forced exploration) | Medium (history tracking) |
| Max steps | ❌ No | ✅ Yes (terminates episode) | Low (simple counter) |
| ε-greedy | ❌ No | ⚠️ Reduces likelihood | Low (random number) |
| Step penalty | ❌ No | ⚠️ Discourages long paths | None (built into rewards) |
Next steps
See it in action
Compare the scenarios side-by-side in the demo
Implementation details
Explore the source code for cycle detection