Skip to main content
Ocean environments are PufferLib’s collection of high-performance reinforcement learning environments implemented in C and CUDA. They achieve millions of steps per second by running simulation logic directly in compiled code with minimal Python overhead.

What are Ocean environments?

Ocean environments are RL environments that:
  • Run in C/CUDA: Core simulation logic is implemented in C or CUDA, not Python
  • Scale massively: Support thousands to millions of parallel agents per environment
  • Minimize overhead: Observations and actions are written directly to shared memory buffers
  • Integrate seamlessly: Expose a standard Gymnasium-compatible Python interface

Performance

Ocean environments achieve 1M-10M+ steps per second on a single CPU core

Scalability

Run thousands of environments in parallel with minimal memory overhead

Flexibility

Custom C bindings allow arbitrary simulation complexity

Compatibility

Standard Gymnasium API works with any RL framework

Architecture

Ocean environments use a unique architecture that separates Python interface from simulation logic:
1

Python wrapper

A lightweight Python class (pufferlib.PufferEnv) handles the Gymnasium API and manages shared memory buffers for observations, actions, rewards, and terminals.
2

C binding layer

The env_binding.h template provides standardized functions for environment initialization, stepping, rendering, and cleanup. Each environment implements required callbacks.
3

Simulation core

Pure C/CUDA code implements the actual environment logic. This code has zero Python dependencies and operates directly on memory buffers.
4

Vectorization

Multiple environment instances share the same compiled code but operate on different slices of the shared memory arrays, enabling efficient parallelization.

Performance characteristics

Ocean environments achieve exceptional performance through several key optimizations:

Zero-copy memory sharing

Observations and actions are stored in NumPy arrays backed by shared memory. C code writes observations directly to these buffers without any copying or serialization:
# Observations array allocated once
obs = np.zeros((num_envs, obs_dim), dtype=np.float32)

# C code writes directly to obs buffer
binding.vec_step(c_envs)  # Updates obs in-place

# No copying - obs is immediately available
actions = policy(obs)

Minimal Python overhead

The Python wrapper is extremely thin - just a few lines per step:
def step(self, actions):
    self.actions[:] = actions  # Copy to shared buffer
    binding.vec_step(self.c_envs)  # C does all the work
    return (self.observations, self.rewards,
            self.terminals, self.truncations, [])

Efficient vectorization

Multiple environments run in parallel with minimal overhead:
env = Snake(
    num_envs=16,
    num_snakes=256,  # 256 agents per env
    width=640,
    height=360,
)
# Total: 4096 agents
# Performance: ~10M steps/second

Batch processing

All environments step simultaneously in a single C function call:
// In C: Process all environments in tight loop
void vec_step(VecEnv* vec) {
    for (int i = 0; i < vec->num_envs; i++) {
        c_step(vec->envs[i]);  // No Python calls
    }
}

Typical performance

Benchmark results on a modern CPU (single core):
EnvironmentAgentsSteps/secondNotes
Cartpole4,0965MSimple dynamics
Snake4,09610MOptimized grid
Asteroids4,0963MCollision detection
Breakout1,0242MRendering enabled
Battle512 × 10241MComplex multi-agent
Performance varies by environment complexity and hardware. GPU-based environments (CUDA) can achieve even higher throughput.

C/CUDA optimization benefits

Compared to pure Python environments, Ocean environments provide: 10-1000x speedup over naive Python implementations
  • Eliminates Python interpreter overhead
  • Enables SIMD vectorization and compiler optimizations
  • Reduces memory allocations and garbage collection
Consistent performance
  • No garbage collection pauses
  • Predictable memory usage
  • Scales linearly with cores
Lower resource usage
  • Smaller memory footprint per environment
  • Efficient cache utilization
  • Better CPU utilization

When to use Ocean environments

Ocean environments are ideal for:

Large-scale training

Training with millions of environment steps per update

Rapid iteration

Quick experiments with fast turnaround times

Benchmarking

Standardized high-performance test environments

Custom simulations

Building new environments that need maximum speed

Example usage

import pufferlib
from pufferlib.ocean.cartpole import Cartpole

# Create vectorized environment
env = Cartpole(num_envs=4096, continuous=True)

# Standard Gymnasium API
obs, info = env.reset()

for _ in range(1000):
    # Your policy here
    actions = model(obs)
    
    # Step all 4096 environments at once
    obs, rewards, terminals, truncations, info = env.step(actions)

env.close()

Next steps

Browse environments

See all available Ocean environments and their features

Create custom environments

Learn how to build your own high-performance C environments

Build docs developers (and LLMs) love