Available Ocean environments

PufferLib includes a diverse collection of Ocean environments spanning classic control, Atari-style games, multi-agent scenarios, and custom simulations.

Classic control

Standard RL benchmark environments with high-performance C implementations.

Cartpole

Classic pole balancing taskBalance a pole on a moving cart. Supports both discrete and continuous action spaces.

Observation: 4D (position, velocity, angle, angular velocity)
Action: Discrete(2) or Continuous(1)
Performance: ~5M steps/sec

Continuous

Continuous control sanity testSimple continuous action space environment for testing.

Observation: Configurable
Action: Box (continuous)
Performance: ~8M steps/sec

Atari-style games

Classic arcade games reimplemented in high-performance C.

Asteroids

Fly a spaceship and destroy asteroidsNavigate space, shoot asteroids, avoid collisions. Asteroids split when hit.

Observation: 104D (player state + 20 nearest asteroids)
Action: Discrete(4) - forward, left, right, shoot
Performance: ~3M steps/sec

Breakout

Brick-breaking paddle gameControl a paddle to bounce a ball and break bricks.

Observation: 118D (paddle, ball, bricks state)
Action: Discrete(3) - left, stay, right
Performance: ~2M steps/sec

Pong

Two-player paddle ball gameClassic Pong with configurable physics.

Observation: Low-dimensional state
Action: Discrete(3)
Performance: ~4M steps/sec

Freeway

Cross the highway without getting hitNavigate through traffic to reach the other side.

Observation: Grid-based
Action: Discrete(4)
Performance: ~3M steps/sec

Enduro

Racing gameNavigate through traffic at high speed.

Observation: Grid-based
Action: Discrete(3)
Performance: ~3M steps/sec

Blastar

Space shooterShoot enemies while avoiding obstacles.

Observation: Spatial
Action: Discrete
Performance: ~2M steps/sec

Grid-based games

Environments with 2D grid observations and discrete actions.

Snake

Multi-agent snake gameHighly optimized snake with thousands of concurrent snakes. Eat food, avoid collisions.

Observation: (2vision+1, 2vision+1) grid
Action: Discrete(4) - up, down, left, right
Agents: Configurable (default 4096)
Performance: ~10M steps/sec

Grid

Customizable grid worldTemplate for grid-based environments with configurable mechanics.

Observation: (11, 11) grid with 32 tile types
Action: Discrete or continuous
Performance: ~5M steps/sec

Tetris

Falling blocks puzzleStack and clear lines in the classic puzzle game.

Observation: Board state
Action: Discrete (move, rotate)
Performance: ~2M steps/sec

Pacman

Maze navigation with ghostsCollect pellets while avoiding ghosts.

Observation: Grid-based
Action: Discrete(4)
Performance: ~3M steps/sec

Board games

Two-player strategy games with perfect information.

Connect4

Connect four in a rowDrop pieces to create four in a row horizontally, vertically, or diagonally.

Observation: Board state
Action: Discrete(7) - column selection
Performance: ~4M steps/sec

Checkers

Classic checkers/draughtsJump opponent pieces and reach the far side.

Observation: Board state
Action: Discrete (legal moves)
Performance: ~2M steps/sec

Go

Ancient strategy board gameSurround territory on a grid.

Observation: 2 × board_size² + 2 (current/previous position + metadata)
Action: Discrete (board positions + pass)
Performance: ~1M steps/sec

2048

Tile merging puzzleCombine tiles with the same number to reach 2048.

Observation: 16D board + metadata
Action: Discrete(4) - slide direction
Performance: ~3M steps/sec

Triple Triad

Card placement strategy gamePlace cards to capture opponent pieces.

Observation: Card and board state
Action: Discrete (placement)
Performance: ~2M steps/sec

Multi-agent environments

Environments with multiple interacting agents.

Battle

Multi-army combat simulationLarge-scale multi-agent warfare with factories and armies.

Observation: (num_armies3 + 416 + 22 + 8)D per agent
Action: Box(3) - continuous movement
Agents: 512-2048 per environment
Performance: ~1M steps/sec

MOBA

Multiplayer online battle arenaTeam-based combat with lanes and objectives.

Observation: Spatial + entity features
Action: MultiDiscrete (move + ability)
Performance: ~500K steps/sec

NMMO3

Neural MMO environmentMassively multi-agent survival and exploration.

Observation: 11×15×10 map + player features
Action: Discrete (movement + actions)
Performance: ~300K steps/sec

Slime Volleyball

Two-player volleyballCompetitive 1v1 or 2v2 volleyball.

Observation: Physics state
Action: Discrete(3)
Performance: ~2M steps/sec

Robotics and control

Environments inspired by robotics tasks.

Drone

Quadcopter controlNavigate a drone through 3D space.

Observation: State vector or Dict
Action: Continuous or MultiDiscrete
Performance: ~3M steps/sec

Drive

Autonomous drivingNavigate road with other vehicles.

Observation: Ego (7D) + partners (63×7) + road (200×7)
Action: MultiDiscrete (steering + acceleration)
Performance: ~1M steps/sec

RWARE

Robot warehouse managementCoordinate robots to move items in a warehouse.

Observation: Grid-based
Action: Discrete
Performance: ~2M steps/sec

Trash Pickup

Multi-robot coordinationPick up trash items on a grid.

Observation: 5×11×11 grid
Action: Discrete(4)
Performance: ~4M steps/sec

Research environments

Specialized environments for RL research.

Boids

Flocking behaviorEmergent swarm dynamics with multiple agents.

Observation: Variable (4 per neighbor)
Action: MultiDiscrete
Performance: ~5M steps/sec

Impulse Wars

Multi-drone combatAdvanced drone warfare with weapons and projectiles.

Observation: Map (CNN) + discrete + continuous features
Action: Continuous or MultiDiscrete
Performance: ~500K steps/sec

Terraform

Territory modificationModify grid terrain to achieve objectives.

Observation: Local (2×11×11) + global (2×6×6) + 5D features
Action: MultiDiscrete
Performance: ~2M steps/sec

Matsci

Materials science simulationResearch-focused materials optimization.

Observation: Domain-specific
Action: Configurable
Performance: Varies

Tactical

Turn-based tacticsStrategic combat on a grid.

Observation: Grid + unit states
Action: MultiDiscrete
Performance: ~1M steps/sec

Tower Climb

3D climbing challengeNavigate vertical structures.

Observation: 3D grid (5×5×9) + 3D player info
Action: Discrete
Performance: ~2M steps/sec

Shared Pool

Common pool resource managementStudy cooperation in resource extraction.

Observation: Resource state
Action: Discrete or continuous
Performance: ~3M steps/sec

Conversion environments

Environments demonstrating mechanics or serving as templates.

Convert

Resource conversionConvert resources between types.

Observation: State-based
Action: Discrete
Performance: ~4M steps/sec

Convert Circle

Circular conversionVariant with circular resource dependencies.

Observation: State-based
Action: Discrete
Performance: ~4M steps/sec

Template

Environment templateStarting point for creating new Ocean environments.

Observation: Customizable
Action: Customizable
Performance: Reference implementation

Robocode

Robot combat programmingProgram robots to battle in an arena.

Observation: Robot sensor data
Action: Movement and firing commands
Multi-agent robot combat

Rocket Lander

Rocket landing controlLand a rocket safely on a platform.

Observation: Position, velocity, fuel
Action: Thruster control
Continuous control task

Target

Target tracking and aimingTrack and hit moving targets.

Observation: Target positions
Action: Aiming directions
Precision control

TCG

Trading card gameStrategic card battle game.

Observation: Hand, board, deck state
Action: Card plays and targeting
Complex strategy game

Whisker Racer

Racing with sensor-based controlRace using whisker-style distance sensors.

Observation: Distance sensors
Action: Steering and acceleration
Sensor-based navigation

Sanity check environments

Simple environments for testing and debugging RL algorithms.

Squared

Distance-to-target testReach target positions in minimal steps.

Configurable targets and distances
Tests basic policy learning

PySquared

Pure Python versionPython implementation of Squared for comparison.

Same mechanics as Squared
Useful for performance benchmarking

Memory

Memory taskRemember and recall sequences.

Tests recurrent architectures
Configurable memory length

T-Maze

Memory-based navigationNavigate maze based on initial cue.

Tests memory retention
Classic RL benchmark

Chain MDP

Sequential decision chainTests credit assignment over long horizons.

Configurable chain length
Sparse rewards

OneStateWorld

Single-state environmentMinimal environment for algorithm testing.

One state, multiple actions
Tests basic learning

OnlyFish

Simple foragingCollect items in minimal environment.

Basic reward mechanics
Quick iteration testing

Usage examples

Basic usage (Cartpole)

from pufferlib.ocean.cartpole import Cartpole

# Create environment
env = Cartpole(
    num_envs=4096,
    continuous=True,  # or False for discrete
    cart_mass=1.0,
    pole_mass=0.1,
    gravity=9.8,
)

obs, info = env.reset()

for _ in range(1000):
    actions = policy(obs)  # Your policy here
    obs, rewards, terms, truncs, info = env.step(actions)

env.close()

Multi-agent (Snake)

from pufferlib.ocean.snake import Snake

# Create 16 environments with 256 snakes each
env = Snake(
    num_envs=16,
    num_snakes=256,  # Per environment
    width=640,
    height=360,
    vision=5,
    reward_food=0.1,
    reward_death=-1.0,
)

obs, info = env.reset()

# obs.shape = (4096, 11, 11)  # 16 * 256 agents
# Each agent sees local 11x11 grid around itself

for _ in range(1000):
    actions = policy(obs)
    obs, rewards, terms, truncs, info = env.step(actions)

env.close()

Advanced (Battle)

from pufferlib.ocean.battle import Battle

env = Battle(
    num_envs=8,
    num_agents=1024,  # Per environment
    num_armies=4,
    width=1920,
    height=1080,
    render_mode='human',
)

obs, info = env.reset()
# obs.shape = (8192, obs_dim)  # 8 * 1024 agents

for _ in range(1000):
    actions = policy(obs)  # Continuous actions
    obs, rewards, terms, truncs, info = env.step(actions)

env.render()  # Visualize first environment
env.close()

Environment creator function

All Ocean environments can be created through the unified creator:

import pufferlib.ocean

# Get environment class
env_class = pufferlib.ocean.env_creator('puffer_snake')

# Create environment
env = env_class(num_envs=16, num_snakes=256)

Available environment names:

puffer_asteroids, puffer_battle, puffer_blastar, puffer_breakout
puffer_cartpole, puffer_connect4, puffer_convert, puffer_drone
puffer_enduro, puffer_freeway, puffer_go, puffer_grid
puffer_moba, puffer_nmmo3, puffer_pacman, puffer_pong
puffer_snake, puffer_tetris, puffer_terraform
And many more (see environment.py MAKE_FUNCTIONS dict)

Getting Started

Core Concepts

Training

Environment Wrappers

Ocean Environments

Advanced

Examples

​Classic control