Classic control
Standard RL benchmark environments with high-performance C implementations.Cartpole
Classic pole balancing taskBalance a pole on a moving cart. Supports both discrete and continuous action spaces.
- Observation: 4D (position, velocity, angle, angular velocity)
- Action: Discrete(2) or Continuous(1)
- Performance: ~5M steps/sec
Continuous
Continuous control sanity testSimple continuous action space environment for testing.
- Observation: Configurable
- Action: Box (continuous)
- Performance: ~8M steps/sec
Atari-style games
Classic arcade games reimplemented in high-performance C.Asteroids
Fly a spaceship and destroy asteroidsNavigate space, shoot asteroids, avoid collisions. Asteroids split when hit.
- Observation: 104D (player state + 20 nearest asteroids)
- Action: Discrete(4) - forward, left, right, shoot
- Performance: ~3M steps/sec
Breakout
Brick-breaking paddle gameControl a paddle to bounce a ball and break bricks.
- Observation: 118D (paddle, ball, bricks state)
- Action: Discrete(3) - left, stay, right
- Performance: ~2M steps/sec
Pong
Two-player paddle ball gameClassic Pong with configurable physics.
- Observation: Low-dimensional state
- Action: Discrete(3)
- Performance: ~4M steps/sec
Freeway
Cross the highway without getting hitNavigate through traffic to reach the other side.
- Observation: Grid-based
- Action: Discrete(4)
- Performance: ~3M steps/sec
Enduro
Racing gameNavigate through traffic at high speed.
- Observation: Grid-based
- Action: Discrete(3)
- Performance: ~3M steps/sec
Blastar
Space shooterShoot enemies while avoiding obstacles.
- Observation: Spatial
- Action: Discrete
- Performance: ~2M steps/sec
Grid-based games
Environments with 2D grid observations and discrete actions.Snake
Multi-agent snake gameHighly optimized snake with thousands of concurrent snakes. Eat food, avoid collisions.
- Observation: (2vision+1, 2vision+1) grid
- Action: Discrete(4) - up, down, left, right
- Agents: Configurable (default 4096)
- Performance: ~10M steps/sec
Grid
Customizable grid worldTemplate for grid-based environments with configurable mechanics.
- Observation: (11, 11) grid with 32 tile types
- Action: Discrete or continuous
- Performance: ~5M steps/sec
Tetris
Falling blocks puzzleStack and clear lines in the classic puzzle game.
- Observation: Board state
- Action: Discrete (move, rotate)
- Performance: ~2M steps/sec
Pacman
Maze navigation with ghostsCollect pellets while avoiding ghosts.
- Observation: Grid-based
- Action: Discrete(4)
- Performance: ~3M steps/sec
Board games
Two-player strategy games with perfect information.Connect4
Connect four in a rowDrop pieces to create four in a row horizontally, vertically, or diagonally.
- Observation: Board state
- Action: Discrete(7) - column selection
- Performance: ~4M steps/sec
Checkers
Classic checkers/draughtsJump opponent pieces and reach the far side.
- Observation: Board state
- Action: Discrete (legal moves)
- Performance: ~2M steps/sec
Go
Ancient strategy board gameSurround territory on a grid.
- Observation: 2 × board_size² + 2 (current/previous position + metadata)
- Action: Discrete (board positions + pass)
- Performance: ~1M steps/sec
2048
Tile merging puzzleCombine tiles with the same number to reach 2048.
- Observation: 16D board + metadata
- Action: Discrete(4) - slide direction
- Performance: ~3M steps/sec
Triple Triad
Card placement strategy gamePlace cards to capture opponent pieces.
- Observation: Card and board state
- Action: Discrete (placement)
- Performance: ~2M steps/sec
Multi-agent environments
Environments with multiple interacting agents.Battle
Multi-army combat simulationLarge-scale multi-agent warfare with factories and armies.
- Observation: (num_armies3 + 416 + 22 + 8)D per agent
- Action: Box(3) - continuous movement
- Agents: 512-2048 per environment
- Performance: ~1M steps/sec
MOBA
Multiplayer online battle arenaTeam-based combat with lanes and objectives.
- Observation: Spatial + entity features
- Action: MultiDiscrete (move + ability)
- Performance: ~500K steps/sec
NMMO3
Neural MMO environmentMassively multi-agent survival and exploration.
- Observation: 11×15×10 map + player features
- Action: Discrete (movement + actions)
- Performance: ~300K steps/sec
Slime Volleyball
Two-player volleyballCompetitive 1v1 or 2v2 volleyball.
- Observation: Physics state
- Action: Discrete(3)
- Performance: ~2M steps/sec
Robotics and control
Environments inspired by robotics tasks.Drone
Quadcopter controlNavigate a drone through 3D space.
- Observation: State vector or Dict
- Action: Continuous or MultiDiscrete
- Performance: ~3M steps/sec
Drive
Autonomous drivingNavigate road with other vehicles.
- Observation: Ego (7D) + partners (63×7) + road (200×7)
- Action: MultiDiscrete (steering + acceleration)
- Performance: ~1M steps/sec
RWARE
Robot warehouse managementCoordinate robots to move items in a warehouse.
- Observation: Grid-based
- Action: Discrete
- Performance: ~2M steps/sec
Trash Pickup
Multi-robot coordinationPick up trash items on a grid.
- Observation: 5×11×11 grid
- Action: Discrete(4)
- Performance: ~4M steps/sec
Research environments
Specialized environments for RL research.Boids
Flocking behaviorEmergent swarm dynamics with multiple agents.
- Observation: Variable (4 per neighbor)
- Action: MultiDiscrete
- Performance: ~5M steps/sec
Impulse Wars
Multi-drone combatAdvanced drone warfare with weapons and projectiles.
- Observation: Map (CNN) + discrete + continuous features
- Action: Continuous or MultiDiscrete
- Performance: ~500K steps/sec
Terraform
Territory modificationModify grid terrain to achieve objectives.
- Observation: Local (2×11×11) + global (2×6×6) + 5D features
- Action: MultiDiscrete
- Performance: ~2M steps/sec
Matsci
Materials science simulationResearch-focused materials optimization.
- Observation: Domain-specific
- Action: Configurable
- Performance: Varies
Tactical
Turn-based tacticsStrategic combat on a grid.
- Observation: Grid + unit states
- Action: MultiDiscrete
- Performance: ~1M steps/sec
Tower Climb
3D climbing challengeNavigate vertical structures.
- Observation: 3D grid (5×5×9) + 3D player info
- Action: Discrete
- Performance: ~2M steps/sec
Shared Pool
Common pool resource managementStudy cooperation in resource extraction.
- Observation: Resource state
- Action: Discrete or continuous
- Performance: ~3M steps/sec
Conversion environments
Environments demonstrating mechanics or serving as templates.Convert
Resource conversionConvert resources between types.
- Observation: State-based
- Action: Discrete
- Performance: ~4M steps/sec
Convert Circle
Circular conversionVariant with circular resource dependencies.
- Observation: State-based
- Action: Discrete
- Performance: ~4M steps/sec
Template
Environment templateStarting point for creating new Ocean environments.
- Observation: Customizable
- Action: Customizable
- Performance: Reference implementation
Robocode
Robot combat programmingProgram robots to battle in an arena.
- Observation: Robot sensor data
- Action: Movement and firing commands
- Multi-agent robot combat
Rocket Lander
Rocket landing controlLand a rocket safely on a platform.
- Observation: Position, velocity, fuel
- Action: Thruster control
- Continuous control task
Target
Target tracking and aimingTrack and hit moving targets.
- Observation: Target positions
- Action: Aiming directions
- Precision control
TCG
Trading card gameStrategic card battle game.
- Observation: Hand, board, deck state
- Action: Card plays and targeting
- Complex strategy game
Whisker Racer
Racing with sensor-based controlRace using whisker-style distance sensors.
- Observation: Distance sensors
- Action: Steering and acceleration
- Sensor-based navigation
Sanity check environments
Simple environments for testing and debugging RL algorithms.Squared
Distance-to-target testReach target positions in minimal steps.
- Configurable targets and distances
- Tests basic policy learning
PySquared
Pure Python versionPython implementation of Squared for comparison.
- Same mechanics as Squared
- Useful for performance benchmarking
Memory
Memory taskRemember and recall sequences.
- Tests recurrent architectures
- Configurable memory length
T-Maze
Memory-based navigationNavigate maze based on initial cue.
- Tests memory retention
- Classic RL benchmark
Chain MDP
Sequential decision chainTests credit assignment over long horizons.
- Configurable chain length
- Sparse rewards
OneStateWorld
Single-state environmentMinimal environment for algorithm testing.
- One state, multiple actions
- Tests basic learning
OnlyFish
Simple foragingCollect items in minimal environment.
- Basic reward mechanics
- Quick iteration testing
Usage examples
Basic usage (Cartpole)
Multi-agent (Snake)
Advanced (Battle)
Environment creator function
All Ocean environments can be created through the unified creator:puffer_asteroids,puffer_battle,puffer_blastar,puffer_breakoutpuffer_cartpole,puffer_connect4,puffer_convert,puffer_dronepuffer_enduro,puffer_freeway,puffer_go,puffer_gridpuffer_moba,puffer_nmmo3,puffer_pacman,puffer_pongpuffer_snake,puffer_tetris,puffer_terraform- And many more (see environment.py MAKE_FUNCTIONS dict)
Next steps
Understand architecture
Learn how Ocean environments achieve high performance
Create custom environments
Build your own high-performance C environment