Skip to main content
PufferLib includes a diverse collection of Ocean environments spanning classic control, Atari-style games, multi-agent scenarios, and custom simulations.

Classic control

Standard RL benchmark environments with high-performance C implementations.

Cartpole

Classic pole balancing taskBalance a pole on a moving cart. Supports both discrete and continuous action spaces.
  • Observation: 4D (position, velocity, angle, angular velocity)
  • Action: Discrete(2) or Continuous(1)
  • Performance: ~5M steps/sec

Continuous

Continuous control sanity testSimple continuous action space environment for testing.
  • Observation: Configurable
  • Action: Box (continuous)
  • Performance: ~8M steps/sec

Atari-style games

Classic arcade games reimplemented in high-performance C.

Asteroids

Fly a spaceship and destroy asteroidsNavigate space, shoot asteroids, avoid collisions. Asteroids split when hit.
  • Observation: 104D (player state + 20 nearest asteroids)
  • Action: Discrete(4) - forward, left, right, shoot
  • Performance: ~3M steps/sec

Breakout

Brick-breaking paddle gameControl a paddle to bounce a ball and break bricks.
  • Observation: 118D (paddle, ball, bricks state)
  • Action: Discrete(3) - left, stay, right
  • Performance: ~2M steps/sec

Pong

Two-player paddle ball gameClassic Pong with configurable physics.
  • Observation: Low-dimensional state
  • Action: Discrete(3)
  • Performance: ~4M steps/sec

Freeway

Cross the highway without getting hitNavigate through traffic to reach the other side.
  • Observation: Grid-based
  • Action: Discrete(4)
  • Performance: ~3M steps/sec

Enduro

Racing gameNavigate through traffic at high speed.
  • Observation: Grid-based
  • Action: Discrete(3)
  • Performance: ~3M steps/sec

Blastar

Space shooterShoot enemies while avoiding obstacles.
  • Observation: Spatial
  • Action: Discrete
  • Performance: ~2M steps/sec

Grid-based games

Environments with 2D grid observations and discrete actions.

Snake

Multi-agent snake gameHighly optimized snake with thousands of concurrent snakes. Eat food, avoid collisions.
  • Observation: (2vision+1, 2vision+1) grid
  • Action: Discrete(4) - up, down, left, right
  • Agents: Configurable (default 4096)
  • Performance: ~10M steps/sec

Grid

Customizable grid worldTemplate for grid-based environments with configurable mechanics.
  • Observation: (11, 11) grid with 32 tile types
  • Action: Discrete or continuous
  • Performance: ~5M steps/sec

Tetris

Falling blocks puzzleStack and clear lines in the classic puzzle game.
  • Observation: Board state
  • Action: Discrete (move, rotate)
  • Performance: ~2M steps/sec

Pacman

Maze navigation with ghostsCollect pellets while avoiding ghosts.
  • Observation: Grid-based
  • Action: Discrete(4)
  • Performance: ~3M steps/sec

Board games

Two-player strategy games with perfect information.

Connect4

Connect four in a rowDrop pieces to create four in a row horizontally, vertically, or diagonally.
  • Observation: Board state
  • Action: Discrete(7) - column selection
  • Performance: ~4M steps/sec

Checkers

Classic checkers/draughtsJump opponent pieces and reach the far side.
  • Observation: Board state
  • Action: Discrete (legal moves)
  • Performance: ~2M steps/sec

Go

Ancient strategy board gameSurround territory on a grid.
  • Observation: 2 × board_size² + 2 (current/previous position + metadata)
  • Action: Discrete (board positions + pass)
  • Performance: ~1M steps/sec

2048

Tile merging puzzleCombine tiles with the same number to reach 2048.
  • Observation: 16D board + metadata
  • Action: Discrete(4) - slide direction
  • Performance: ~3M steps/sec

Triple Triad

Card placement strategy gamePlace cards to capture opponent pieces.
  • Observation: Card and board state
  • Action: Discrete (placement)
  • Performance: ~2M steps/sec

Multi-agent environments

Environments with multiple interacting agents.

Battle

Multi-army combat simulationLarge-scale multi-agent warfare with factories and armies.
  • Observation: (num_armies3 + 416 + 22 + 8)D per agent
  • Action: Box(3) - continuous movement
  • Agents: 512-2048 per environment
  • Performance: ~1M steps/sec

MOBA

Multiplayer online battle arenaTeam-based combat with lanes and objectives.
  • Observation: Spatial + entity features
  • Action: MultiDiscrete (move + ability)
  • Performance: ~500K steps/sec

NMMO3

Neural MMO environmentMassively multi-agent survival and exploration.
  • Observation: 11×15×10 map + player features
  • Action: Discrete (movement + actions)
  • Performance: ~300K steps/sec

Slime Volleyball

Two-player volleyballCompetitive 1v1 or 2v2 volleyball.
  • Observation: Physics state
  • Action: Discrete(3)
  • Performance: ~2M steps/sec

Robotics and control

Environments inspired by robotics tasks.

Drone

Quadcopter controlNavigate a drone through 3D space.
  • Observation: State vector or Dict
  • Action: Continuous or MultiDiscrete
  • Performance: ~3M steps/sec

Drive

Autonomous drivingNavigate road with other vehicles.
  • Observation: Ego (7D) + partners (63×7) + road (200×7)
  • Action: MultiDiscrete (steering + acceleration)
  • Performance: ~1M steps/sec

RWARE

Robot warehouse managementCoordinate robots to move items in a warehouse.
  • Observation: Grid-based
  • Action: Discrete
  • Performance: ~2M steps/sec

Trash Pickup

Multi-robot coordinationPick up trash items on a grid.
  • Observation: 5×11×11 grid
  • Action: Discrete(4)
  • Performance: ~4M steps/sec

Research environments

Specialized environments for RL research.

Boids

Flocking behaviorEmergent swarm dynamics with multiple agents.
  • Observation: Variable (4 per neighbor)
  • Action: MultiDiscrete
  • Performance: ~5M steps/sec

Impulse Wars

Multi-drone combatAdvanced drone warfare with weapons and projectiles.
  • Observation: Map (CNN) + discrete + continuous features
  • Action: Continuous or MultiDiscrete
  • Performance: ~500K steps/sec

Terraform

Territory modificationModify grid terrain to achieve objectives.
  • Observation: Local (2×11×11) + global (2×6×6) + 5D features
  • Action: MultiDiscrete
  • Performance: ~2M steps/sec

Matsci

Materials science simulationResearch-focused materials optimization.
  • Observation: Domain-specific
  • Action: Configurable
  • Performance: Varies

Tactical

Turn-based tacticsStrategic combat on a grid.
  • Observation: Grid + unit states
  • Action: MultiDiscrete
  • Performance: ~1M steps/sec

Tower Climb

3D climbing challengeNavigate vertical structures.
  • Observation: 3D grid (5×5×9) + 3D player info
  • Action: Discrete
  • Performance: ~2M steps/sec

Shared Pool

Common pool resource managementStudy cooperation in resource extraction.
  • Observation: Resource state
  • Action: Discrete or continuous
  • Performance: ~3M steps/sec

Conversion environments

Environments demonstrating mechanics or serving as templates.

Convert

Resource conversionConvert resources between types.
  • Observation: State-based
  • Action: Discrete
  • Performance: ~4M steps/sec

Convert Circle

Circular conversionVariant with circular resource dependencies.
  • Observation: State-based
  • Action: Discrete
  • Performance: ~4M steps/sec

Template

Environment templateStarting point for creating new Ocean environments.
  • Observation: Customizable
  • Action: Customizable
  • Performance: Reference implementation

Robocode

Robot combat programmingProgram robots to battle in an arena.
  • Observation: Robot sensor data
  • Action: Movement and firing commands
  • Multi-agent robot combat

Rocket Lander

Rocket landing controlLand a rocket safely on a platform.
  • Observation: Position, velocity, fuel
  • Action: Thruster control
  • Continuous control task

Target

Target tracking and aimingTrack and hit moving targets.
  • Observation: Target positions
  • Action: Aiming directions
  • Precision control

TCG

Trading card gameStrategic card battle game.
  • Observation: Hand, board, deck state
  • Action: Card plays and targeting
  • Complex strategy game

Whisker Racer

Racing with sensor-based controlRace using whisker-style distance sensors.
  • Observation: Distance sensors
  • Action: Steering and acceleration
  • Sensor-based navigation

Sanity check environments

Simple environments for testing and debugging RL algorithms.

Squared

Distance-to-target testReach target positions in minimal steps.
  • Configurable targets and distances
  • Tests basic policy learning

PySquared

Pure Python versionPython implementation of Squared for comparison.
  • Same mechanics as Squared
  • Useful for performance benchmarking

Memory

Memory taskRemember and recall sequences.
  • Tests recurrent architectures
  • Configurable memory length

T-Maze

Memory-based navigationNavigate maze based on initial cue.
  • Tests memory retention
  • Classic RL benchmark

Chain MDP

Sequential decision chainTests credit assignment over long horizons.
  • Configurable chain length
  • Sparse rewards

OneStateWorld

Single-state environmentMinimal environment for algorithm testing.
  • One state, multiple actions
  • Tests basic learning

OnlyFish

Simple foragingCollect items in minimal environment.
  • Basic reward mechanics
  • Quick iteration testing

Usage examples

Basic usage (Cartpole)

from pufferlib.ocean.cartpole import Cartpole

# Create environment
env = Cartpole(
    num_envs=4096,
    continuous=True,  # or False for discrete
    cart_mass=1.0,
    pole_mass=0.1,
    gravity=9.8,
)

obs, info = env.reset()

for _ in range(1000):
    actions = policy(obs)  # Your policy here
    obs, rewards, terms, truncs, info = env.step(actions)

env.close()

Multi-agent (Snake)

from pufferlib.ocean.snake import Snake

# Create 16 environments with 256 snakes each
env = Snake(
    num_envs=16,
    num_snakes=256,  # Per environment
    width=640,
    height=360,
    vision=5,
    reward_food=0.1,
    reward_death=-1.0,
)

obs, info = env.reset()

# obs.shape = (4096, 11, 11)  # 16 * 256 agents
# Each agent sees local 11x11 grid around itself

for _ in range(1000):
    actions = policy(obs)
    obs, rewards, terms, truncs, info = env.step(actions)

env.close()

Advanced (Battle)

from pufferlib.ocean.battle import Battle

env = Battle(
    num_envs=8,
    num_agents=1024,  # Per environment
    num_armies=4,
    width=1920,
    height=1080,
    render_mode='human',
)

obs, info = env.reset()
# obs.shape = (8192, obs_dim)  # 8 * 1024 agents

for _ in range(1000):
    actions = policy(obs)  # Continuous actions
    obs, rewards, terms, truncs, info = env.step(actions)

env.render()  # Visualize first environment
env.close()

Environment creator function

All Ocean environments can be created through the unified creator:
import pufferlib.ocean

# Get environment class
env_class = pufferlib.ocean.env_creator('puffer_snake')

# Create environment
env = env_class(num_envs=16, num_snakes=256)
Available environment names:
  • puffer_asteroids, puffer_battle, puffer_blastar, puffer_breakout
  • puffer_cartpole, puffer_connect4, puffer_convert, puffer_drone
  • puffer_enduro, puffer_freeway, puffer_go, puffer_grid
  • puffer_moba, puffer_nmmo3, puffer_pacman, puffer_pong
  • puffer_snake, puffer_tetris, puffer_terraform
  • And many more (see environment.py MAKE_FUNCTIONS dict)

Next steps

Understand architecture

Learn how Ocean environments achieve high performance

Create custom environments

Build your own high-performance C environment

Build docs developers (and LLMs) love