Environment Step Function

The envStep Function

The envStep function simulates the environment’s response to an agent action. It implements the core transition dynamics of the grid world. Here’s the complete implementation:

function envStep(pos, action) {
  let [r, c] = pos;
  if (action === 0) r--;
  else if (action === 1) c++;
  else if (action === 2) r++;
  else if (action === 3) c--;

  r = Math.max(0, Math.min(SIZE - 1, r));
  c = Math.max(0, Math.min(SIZE - 1, c));

  if (r === WALL[0] && c === WALL[1]) { r = pos[0]; c = pos[1]; }

  const done = (r === GOAL[0] && c === GOAL[1]);
  const reward = done ? 10 : -0.1;

  return { pos: [r, c], reward, done };
}

This function is the heart of the simulation—it defines what happens when the agent takes an action.

Function Signature

envStep(pos, action) → { pos, reward, done }

Inputs:

pos: Current position as [row, col] array
action: Integer action (0=up, 1=right, 2=down, 3=left)

Returns:

pos: New position as [row, col] array
reward: Numerical reward for this transition
done: Boolean indicating episode termination

Step 1: Movement Calculation

The function first calculates the intended new position based on the action:

let [r, c] = pos;
if (action === 0) r--;      // Up: decrease row
else if (action === 1) c++; // Right: increase column
else if (action === 2) r++; // Down: increase row
else if (action === 3) c--; // Left: decrease column

This uses destructuring to extract [row, col] into r and c, then modifies them based on the action.

Step 2: Boundary Checking

The grid has hard boundaries at [0, SIZE-1]. The function clamps positions to stay within bounds:

r = Math.max(0, Math.min(SIZE - 1, r));
c = Math.max(0, Math.min(SIZE - 1, c));

Breakdown:

Math.min(SIZE - 1, r): Ensures r ≤ 2 (prevents going off the bottom/right)
Math.max(0, ...): Ensures r ≥ 0 (prevents going off the top/left)

Example: If the agent is at (0,1) and tries to go up (action 0), r would become -1, but Math.max(0, -1) clamps it back to 0.

Step 3: Wall Collision Detection

The grid has an immovable wall at position WALL = [1,1]. If the agent tries to enter the wall, it stays in place:

if (r === WALL[0] && c === WALL[1]) { r = pos[0]; c = pos[1]; }

This checks if the new position (r, c) equals the wall position (1, 1). If so, it reverts to the original position pos. Example:

Agent at (1,0), action right (1) → tries to go to (1,1) → wall collision → stays at (1,0)
Agent at (1,2), action left (3) → tries to go to (1,1) → wall collision → stays at (1,2) ⚠️ This is where the cycle occurs

Step 4: Reward Structure

The reward function is sparse with a large goal reward:

const done = (r === GOAL[0] && c === GOAL[1]);
const reward = done ? 10 : -0.1;

Reaching the goal (2,2): Reward = +10, episode ends (done = true)
Any other step: Reward = -0.1 (small negative reward to encourage shorter paths)

This reward structure:

Incentivizes reaching the goal quickly
Penalizes wandering or getting stuck
Creates a shortest-path optimal policy

Step 5: Return Format

The function returns an object with three fields:

return { pos: [r, c], reward, done };

pos: The new position after applying action and checking collisions
reward: The immediate reward for this transition
done: Boolean flag (true only when goal is reached)

Example Transitions

Here are some example calls to envStep:

// Starting position, go right
envStep([0,0], 1)
// → { pos: [0,1], reward: -0.1, done: false }

// Try to go through wall
envStep([1,2], 3)
// → { pos: [1,2], reward: -0.1, done: false }  ← stays in place!

// Try to go off grid
envStep([0,0], 0)
// → { pos: [0,0], reward: -0.1, done: false }  ← clamped to boundary

// Reach the goal
envStep([1,2], 2)
// → { pos: [2,2], reward: 10, done: true }  ← episode ends!

Grid World Visualization

Here’s the 3×3 grid with coordinates:

  0   1   2
┌───┬───┬───┐
│0,0│0,1│0,2│ 0
├───┼───┼───┤
│1,0│ W │1,2│ 1
├───┼───┼───┤
│2,0│2,1│2,2│ 2
└───┴───┴───┘

W = Wall at (1,1)
Goal at (2,2)

Deterministic vs Stochastic Environments

This implementation is deterministic—the same action in the same state always produces the same result. In stochastic environments, you might add:

// Example: 80% chance of intended action, 20% random slip
if (Math.random() < 0.2) {
  action = Math.floor(Math.random() * 4);
}

But for this demo, determinism makes the cycle problem more obvious and predictable.

The envStep function is pure—it doesn’t modify the input pos array, and has no side effects. This makes it easy to test and reason about. You can call it repeatedly with the same inputs and always get the same output.

Extending the Environment

To create more complex environments, you could:

Add multiple walls: Check against an array of wall positions
Variable rewards: Different cells give different rewards
Moving obstacles: Walls that change position over time
Larger grids: Increase SIZE constant
Diagonal movement: Add actions 4-7 for diagonal moves
Terminal states: Multiple goal positions with different rewards

Overview

Concepts

Demo Guide

Solutions

Implementation

Context

Environment Step Function

The envStep Function

Function Signature

Step 1: Movement Calculation

Step 2: Boundary Checking

Step 3: Wall Collision Detection

Step 4: Reward Structure

Step 5: Return Format

Example Transitions

Grid World Visualization

Deterministic vs Stochastic Environments

Extending the Environment

Build docs developers (and LLMs) love

Overview

Concepts

Demo Guide

Solutions

Implementation

Context

​The envStep Function

​Function Signature

​Step 1: Movement Calculation

​Step 2: Boundary Checking

​Step 3: Wall Collision Detection

​Step 4: Reward Structure

​Step 5: Return Format

​Example Transitions

​Grid World Visualization

​Deterministic vs Stochastic Environments

​Extending the Environment

Build docs developers (and LLMs) love

The envStep Function

Function Signature

Step 1: Movement Calculation

Step 2: Boundary Checking

Step 3: Wall Collision Detection

Step 4: Reward Structure

Step 5: Return Format

Example Transitions

Grid World Visualization

Deterministic vs Stochastic Environments

Extending the Environment