Ocean environments

Ocean environments are PufferLib’s collection of high-performance reinforcement learning environments implemented in C and CUDA. They achieve millions of steps per second by running simulation logic directly in compiled code with minimal Python overhead.

What are Ocean environments?

Ocean environments are RL environments that:

Run in C/CUDA: Core simulation logic is implemented in C or CUDA, not Python
Scale massively: Support thousands to millions of parallel agents per environment
Minimize overhead: Observations and actions are written directly to shared memory buffers
Integrate seamlessly: Expose a standard Gymnasium-compatible Python interface

Performance

Ocean environments achieve 1M-10M+ steps per second on a single CPU core

Scalability

Run thousands of environments in parallel with minimal memory overhead

Flexibility

Custom C bindings allow arbitrary simulation complexity

Compatibility

Standard Gymnasium API works with any RL framework

Architecture

Ocean environments use a unique architecture that separates Python interface from simulation logic:

Python wrapper

A lightweight Python class (pufferlib.PufferEnv) handles the Gymnasium API and manages shared memory buffers for observations, actions, rewards, and terminals.

C binding layer

The env_binding.h template provides standardized functions for environment initialization, stepping, rendering, and cleanup. Each environment implements required callbacks.

Simulation core

Pure C/CUDA code implements the actual environment logic. This code has zero Python dependencies and operates directly on memory buffers.

Vectorization

Multiple environment instances share the same compiled code but operate on different slices of the shared memory arrays, enabling efficient parallelization.

Performance characteristics

Ocean environments achieve exceptional performance through several key optimizations: Observations and actions are stored in NumPy arrays backed by shared memory. C code writes observations directly to these buffers without any copying or serialization:

# Observations array allocated once
obs = np.zeros((num_envs, obs_dim), dtype=np.float32)

# C code writes directly to obs buffer
binding.vec_step(c_envs)  # Updates obs in-place

# No copying - obs is immediately available
actions = policy(obs)

Minimal Python overhead

The Python wrapper is extremely thin - just a few lines per step:

def step(self, actions):
    self.actions[:] = actions  # Copy to shared buffer
    binding.vec_step(self.c_envs)  # C does all the work
    return (self.observations, self.rewards,
            self.terminals, self.truncations, [])

Efficient vectorization

Multiple environments run in parallel with minimal overhead:

env = Snake(
    num_envs=16,
    num_snakes=256,  # 256 agents per env
    width=640,
    height=360,
)
# Total: 4096 agents
# Performance: ~10M steps/second

Batch processing

All environments step simultaneously in a single C function call:

// In C: Process all environments in tight loop
void vec_step(VecEnv* vec) {
    for (int i = 0; i < vec->num_envs; i++) {
        c_step(vec->envs[i]);  // No Python calls
    }
}

Typical performance

Benchmark results on a modern CPU (single core):

Environment	Agents	Steps/second	Notes
Cartpole	4,096	5M	Simple dynamics
Snake	4,096	10M	Optimized grid
Asteroids	4,096	3M	Collision detection
Breakout	1,024	2M	Rendering enabled
Battle	512 × 1024	1M	Complex multi-agent

Performance varies by environment complexity and hardware. GPU-based environments (CUDA) can achieve even higher throughput.

C/CUDA optimization benefits

Compared to pure Python environments, Ocean environments provide: 10-1000x speedup over naive Python implementations

Eliminates Python interpreter overhead
Enables SIMD vectorization and compiler optimizations
Reduces memory allocations and garbage collection

Consistent performance

No garbage collection pauses
Predictable memory usage
Scales linearly with cores

Lower resource usage

Smaller memory footprint per environment
Efficient cache utilization
Better CPU utilization

When to use Ocean environments

Ocean environments are ideal for:

Large-scale training

Training with millions of environment steps per update

Rapid iteration

Quick experiments with fast turnaround times

Benchmarking

Standardized high-performance test environments

Custom simulations

Building new environments that need maximum speed

Example usage

import pufferlib
from pufferlib.ocean.cartpole import Cartpole

# Create vectorized environment
env = Cartpole(num_envs=4096, continuous=True)

# Standard Gymnasium API
obs, info = env.reset()

for _ in range(1000):
    # Your policy here
    actions = model(obs)
    
    # Step all 4096 environments at once
    obs, rewards, terminals, truncations, info = env.step(actions)

env.close()

Getting Started

Core Concepts

Training

Environment Wrappers

Ocean Environments

Advanced

Examples

Ocean environments

What are Ocean environments?

Performance

Scalability

Flexibility

Compatibility

Architecture

Performance characteristics

Minimal Python overhead

Efficient vectorization

Batch processing

Typical performance

C/CUDA optimization benefits

When to use Ocean environments

Large-scale training

Rapid iteration

Benchmarking

Custom simulations

Example usage

Next steps

Browse environments

Create custom environments

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Training

Environment Wrappers

Ocean Environments

Advanced

Examples

​What are Ocean environments?

Performance

Scalability

Flexibility

Compatibility

​Architecture

​Performance characteristics

​Zero-copy memory sharing

​Minimal Python overhead

​Efficient vectorization

​Batch processing

​Typical performance

​C/CUDA optimization benefits

​When to use Ocean environments

Large-scale training

Rapid iteration

Benchmarking

Custom simulations

​Example usage

​Next steps

Browse environments

Create custom environments

Build docs developers (and LLMs) love

What are Ocean environments?

Architecture

Performance characteristics

Zero-copy memory sharing

Minimal Python overhead

Efficient vectorization

Batch processing

Typical performance

C/CUDA optimization benefits

When to use Ocean environments

Example usage

Next steps