What are infinite loops in RL?
In Reinforcement Learning, an agent follows a policy that maps states to actions. When a policy has flaws, the agent can repeat the same sequence of actions indefinitely without ever reaching its goal. This is a well-known failure mode with real implications for RL systems.An infinite loop occurs when an agent’s policy causes it to revisit the same states repeatedly without making progress toward the goal.
How agents get stuck
Agents get trapped in infinite loops when two conditions are met:- Deterministic policy: The policy always returns the same action for a given state
- No exploration: The agent never deviates from the policy’s prescribed actions
The bug in this demo
The demo presents a 3×3 grid world where:- 🤖 The agent starts at position
(0,0) - 🏆 The goal is at position
(2,2) - 🧱 A wall blocks position
(1,1)
(1,2) — just one step above the goal:
(1,2), the policy returns action 3 (left) instead of action 2 (down). This causes the agent to:
- Move left from
(1,2)to(1,1)— but that’s the wall! - Stay at
(1,2)because the wall blocks movement - Try to move left again from
(1,2) - Repeat forever
Real-world implications
Infinite loops aren’t just a theoretical problem. They occur in real RL systems when:- Training policies have errors: Even well-trained policies can have edge cases
- Environment changes: A policy trained on one environment might loop in a modified version
- Sparse rewards: When rewards are rare, agents can get stuck exploring the same areas
- Deterministic execution: Production systems often use deterministic policies for reproducibility
Connection to LLM agents
The same infinite loop problem applies to LLM-based agents that use tools:- An agent might repeatedly reformulate the same search query
- An agent might retry a failed API call without changing parameters
- An agent might loop through the same reasoning steps
Visualizing the problem
The live demo shows this problem in action. Watch the left panel (“Sin Protección”) to see the agent get stuck:- The agent successfully navigates from
(0,0)to(0,2) - Then moves down to
(1,2)— one step from victory - Gets stuck trying to move left forever
- The “Repeticiones” counter climbs as the agent repeats the same failed action
Next steps
Cycle detection
Learn how to detect and break out of infinite loops
Prevention strategies
Explore common solutions for avoiding infinite loops