Control Buttons
▶ Run Button
The Run button executes the demo continuously until the agent reaches the goal or hits the step limit.Click to Start
Click the ▶ Ejecutar button to begin continuous execution. The agent will take actions automatically based on the configured speed.
Observe Behavior
Watch as the agent moves through the grid. Panel 1 will get stuck in a loop, while Panel 2 escapes cycles and reaches the goal.
→ Step Button
The Step button advances the agent by exactly one action, giving you fine-grained control.- Examine the agent’s behavior action-by-action
- See exactly when cycle detection triggers (Panel 2)
- Understand why the agent gets stuck (Panel 1)
Each step executes one action and updates the grid, stats, and log immediately. There’s no delay between your click and the result.
↺ Reset Button
The Reset button returns the demo to its initial state.- Agent position (returns to
(0,0)) - Step count
- Reward accumulation
- Repeat/escape counters
- Movement history
- Log entries
- Start a fresh run after the agent completes or gets stuck
- Compare different scenarios from the same starting point
- Clear the log when it becomes too long
Speed Slider
The speed slider controls how fast the agent moves during continuous execution:- Range: 50ms to 800ms per step
- Default: 350ms per step
- Effect: Only applies when using the Run button, not Step button
- Fast (50ms)
- Medium (350ms)
- Slow (800ms)
Set the slider to the left for rapid execution. Good for seeing the overall behavior quickly, but harder to follow individual actions.
Real-World Example
Here’s what happens when you interact with Panel 1 (no cycle detection):- Click Run: Agent starts at
(0,0)and moves right to(0,1) - After 4 steps: Agent reaches position
(1,2) - Step 5 onward: Agent repeatedly tries to move left (action 3) but hits the wall
- Log shows:
[5] ← → (1,2) (bloqueado)repeated - At step 30: Demo stops with “Límite de 30 pasos alcanzado”
- Steps 1-4: Same as Panel 1
- Step 5: Cycle detected at
(1,2)after 2 visits - Log shows:
⚠️ Ciclo en (1,2) visitado 2x → exploración forzada - Agent escapes: Takes a random action instead of the policy’s bad choice
- Reaches goal: Successfully navigates to
(2,2)
The controls are independent for each panel. You can run Panel 1 while stepping through Panel 2, or have them both running at different speeds.