The badPolicy Function
The agent’s behavior is controlled by thebadPolicy function, which maps each grid position to an action. Here’s the complete implementation from the source code:
- Converts the position
[row, col]to a string key like"0,0" - Looks up the action in the policy mapping
- Returns the action, or defaults to
1(right) if no mapping exists
Policy Mapping Object
The policy is defined as a JavaScript object where keys are position strings and values are action numbers:1 (right) due to the nullish coalescing operator ??.
Action Encoding
Actions are encoded as integers:- 0 = Up (decrease row)
- 1 = Right (increase column)
- 2 = Down (increase row)
- 3 = Left (decrease column)
| Action | Arrow | Movement | Grid Effect |
|---|---|---|---|
| 0 | ↑ | Up | row-- |
| 1 | → | Right | col++ |
| 2 | ↓ | Down | row++ |
| 3 | ← | Left | col-- |
The Deliberate Bug at (1,2)
The bug is at position(1,2) where the policy says to go left (action 3):
- Agent reaches
(1,2)from(0,2)by going down - Policy says: go left (action 3)
- Agent tries to move to
(1,1)but there’s a wall at(1,1) - Wall collision returns agent to
(1,2) - Agent is now at
(1,2)again → policy says go left - Infinite loop:
(1,2) → left → blocked by wall → (1,2) → repeat
2 (down), which would move the agent to (2,2) — the goal.
Visualizing the Intended Path
Here’s what the policy intends to do:Example: Modifying the Policy
You can fix the bug by changing the action at(1,2) from 3 to 2:
(0,0)→ right →(0,1)(0,1)→ right →(0,2)(0,2)→ down →(1,2)(1,2)→ down →(2,2)GOAL!
You can also experiment by adding more mappings to the policy object, or changing the default action from
1 to another value. Try making the agent take a longer path, or create different types of cycles.Why Use a Function Instead of a Lookup Table?
The policy is implemented as a function rather than a pure lookup table for flexibility:- You can add conditional logic based on state
- You can incorporate randomness or exploration strategies
- You can compute actions dynamically instead of storing all mappings
- It’s easier to extend to continuous state spaces later
pos as input and returns an action probability distribution.