Creating Ocean environments

This guide shows you how to create custom Ocean environments using PufferLib’s C binding system. You’ll learn the architecture, required components, and best practices for building environments that run at millions of steps per second.

Environment structure

An Ocean environment consists of three main components:

Python wrapper

Handles Gymnasium API and shared memory

C binding

Compiles C code as Python extension

C implementation

Core simulation logic

Directory structure for a new environment:

pufferlib/ocean/myenv/
├── myenv.py          # Python wrapper
├── myenv.h           # C header with types and logic
├── myenv.c           # Main function / demo (optional)
├── binding.c         # Python binding (includes env_binding.h)
└── setup.py          # Build configuration

Step-by-step guide

Define environment types and constants

Create myenv.h with your environment struct and simulation constants:

myenv.h

#pragma once
#include "raylib.h"
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// Constants
#define MAX_STEPS 500

// Log structure - must contain only floats
typedef struct {
    float episode_length;
    float episode_return;
    float n;  // Number of episodes
} Log;

// Environment structure
typedef struct {
    // Required: shared buffers (passed from Python)
    float* observations;
    float* actions;
    float* rewards;
    unsigned char* terminals;
    
    // Required: logging
    Log log;
    
    // Your environment state
    float x;
    float y;
    float target_x;
    float target_y;
    int tick;
    float episode_return;
    
    // Optional: rendering client
    void* client;
} MyEnv;

The Log struct must contain only float fields. PufferLib’s binding code iterates over it as a float array.

Implement core functions

In myenv.h, implement the required environment functions:

myenv.h (continued)

// Required: Reset environment to initial state
void c_reset(MyEnv* env) {
    env->x = ((float)rand() / RAND_MAX) * 10.0f - 5.0f;
    env->y = ((float)rand() / RAND_MAX) * 10.0f - 5.0f;
    env->target_x = ((float)rand() / RAND_MAX) * 10.0f - 5.0f;
    env->target_y = ((float)rand() / RAND_MAX) * 10.0f - 5.0f;
    env->tick = 0;
    env->episode_return = 0.0f;
    
    // Write initial observation
    env->observations[0] = env->x;
    env->observations[1] = env->y;
    env->observations[2] = env->target_x;
    env->observations[3] = env->target_y;
}

// Required: Step environment one timestep
void c_step(MyEnv* env) {
    // Read action from shared buffer
    float dx = env->actions[0];
    float dy = env->actions[1];
    
    // Update state
    env->x += dx * 0.1f;
    env->y += dy * 0.1f;
    env->tick++;
    
    // Compute reward
    float dist = sqrtf(
        (env->x - env->target_x) * (env->x - env->target_x) +
        (env->y - env->target_y) * (env->y - env->target_y)
    );
    env->rewards[0] = -dist;
    env->episode_return += env->rewards[0];
    
    // Check termination
    int done = (env->tick >= MAX_STEPS) || (dist < 0.1f);
    env->terminals[0] = done ? 1 : 0;
    
    // Write observation
    env->observations[0] = env->x;
    env->observations[1] = env->y;
    env->observations[2] = env->target_x;
    env->observations[3] = env->target_y;
    
    // Update logs on episode end
    if (done) {
        env->log.episode_length += env->tick;
        env->log.episode_return += env->episode_return;
        env->log.n += 1.0f;
        c_reset(env);  // Auto-reset
    }
}

// Required: Render environment (can be empty)
void c_render(MyEnv* env) {
    if (!IsWindowReady()) {
        InitWindow(800, 600, "MyEnv");
        SetTargetFPS(60);
    }
    
    BeginDrawing();
    ClearBackground(BLACK);
    
    // Draw agent (cyan)
    DrawCircle(
        400 + env->x * 40,
        300 + env->y * 40,
        5, BLUE
    );
    
    // Draw target (red)
    DrawCircle(
        400 + env->target_x * 40,
        300 + env->target_y * 40,
        8, RED
    );
    
    EndDrawing();
}

// Required: Cleanup resources
void c_close(MyEnv* env) {
    if (IsWindowReady()) {
        CloseWindow();
    }
}

Create Python binding

Create binding.c that includes the PufferLib binding template:

binding.c

#define Env MyEnv
#include "myenv.h"
#include "../env_binding.h"

The env_binding.h template provides:

env_init() - Initialize single environment
vec_init() - Initialize vectorized environments
env_step(), vec_step() - Step functions
env_reset(), vec_reset() - Reset functions
env_render(), vec_render() - Render functions
env_close(), vec_close() - Cleanup functions
vec_log() - Aggregate logs from all environments

You can define custom methods by setting MY_METHODS, MY_GET, MY_PUT, etc. before including env_binding.h. See existing environments for examples.

Create setup.py for compilation

setup.py

from setuptools import setup, Extension
import numpy as np

module = Extension(
    'pufferlib.ocean.myenv.binding',
    sources=['pufferlib/ocean/myenv/binding.c'],
    include_dirs=[
        np.get_include(),
        'pufferlib/ocean',
        '/usr/local/include',  # For raylib
    ],
    libraries=['raylib', 'm'],
    library_dirs=['/usr/local/lib'],
    extra_compile_args=['-O3', '-std=c11'],
)

setup(
    name='myenv',
    ext_modules=[module],
)

Build the extension:

python setup.py build_ext --inplace

Create Python wrapper

Create myenv.py with a Gymnasium-compatible interface:

myenv.py

import gymnasium
import numpy as np
import pufferlib
from pufferlib.ocean.myenv import binding

class MyEnv(pufferlib.PufferEnv):
    def __init__(self, num_envs=1, render_mode=None, buf=None, seed=0):
        # Define spaces
        self.single_observation_space = gymnasium.spaces.Box(
            low=-np.inf, high=np.inf,
            shape=(4,),  # x, y, target_x, target_y
            dtype=np.float32
        )
        self.single_action_space = gymnasium.spaces.Box(
            low=-1.0, high=1.0,
            shape=(2,),  # dx, dy
            dtype=np.float32
        )
        
        self.render_mode = render_mode
        self.num_agents = num_envs
        
        # Initialize parent (allocates shared buffers)
        super().__init__(buf)
        
        # Initialize C environments
        self.c_envs = binding.vec_init(
            self.observations,
            self.actions,
            self.rewards,
            self.terminals,
            self.truncations,
            num_envs,
            seed,
        )
    
    def reset(self, seed=None):
        if seed is None:
            seed = 0
        binding.vec_reset(self.c_envs, seed)
        return self.observations, []
    
    def step(self, actions):
        self.actions[:] = actions
        binding.vec_step(self.c_envs)
        
        # Collect logs periodically
        info = [binding.vec_log(self.c_envs)]
        
        return (
            self.observations,
            self.rewards,
            self.terminals,
            self.truncations,
            info
        )
    
    def render(self):
        binding.vec_render(self.c_envs, 0)
    
    def close(self):
        binding.vec_close(self.c_envs)

Add your environment to pufferlib/ocean/environment.py:

environment.py

MAKE_FUNCTIONS = {
    # ... existing environments ...
    'myenv': 'MyEnv',
}

Now users can create it with:

env_class = pufferlib.ocean.env_creator('puffer_myenv')
env = env_class(num_envs=4096)

Understanding env_binding.h

The env_binding.h template handles the interface between Python and C:

Initialization

// Called once per environment
static int my_init(Env* env, PyObject* args, PyObject* kwargs)

Implement this function to parse custom kwargs:

#define MY_INIT
static int my_init(MyEnv* env, PyObject* args, PyObject* kwargs) {
    env->max_speed = unpack(kwargs, "max_speed");
    env->gravity = unpack(kwargs, "gravity");
    return 0;
}

Memory management

Buffers are allocated by Python and passed to C as pointers. Never malloc or free these buffers:

// ✅ Correct: Write directly to shared buffer
env->observations[0] = value;

// ❌ Wrong: Don't allocate new arrays
env->observations = malloc(...);

Vectorization

The binding handles vectorization automatically:

// vec_init creates num_envs environments
// Each gets a slice of the shared buffers:
env->observations = obs_buffer + i * obs_size;
env->actions = act_buffer + i * act_size;
// etc.

Logging

vec_log() aggregates logs across all environments:

typedef struct {
    float custom_metric;
    float episode_return;
    float n;  // Always include
} Log;

static int my_log(PyObject* dict, Log* log) {
    assign_to_dict(dict, "custom_metric", log->custom_metric);
    assign_to_dict(dict, "episode_return", log->episode_return);
    return 0;
}

Returned Python dict contains averaged values:

info = binding.vec_log(c_envs)
# {'custom_metric': 0.5, 'episode_return': 123.4, 'n': 42}

Performance optimization tips

Minimize Python calls

Keep all simulation logic in C:

// ✅ Good: Pure C
void c_step(Env* env) {
    update_physics(env);
    check_collisions(env);
    compute_reward(env);
}

// ❌ Bad: Calling Python from C
void c_step(Env* env) {
    PyObject* result = PyObject_CallMethod(...);
}

Use struct-of-arrays for many entities

// ❌ Slow: Array-of-structs (bad cache locality)
typedef struct {
    float x, y, vx, vy;
} Agent;
Agent agents[1000];

// ✅ Fast: Struct-of-arrays (cache-friendly)
typedef struct {
    float x[1000];
    float y[1000];
    float vx[1000];
    float vy[1000];
} Agents;

Avoid dynamic allocation

// ❌ Slow: malloc every step
void c_step(Env* env) {
    float* temp = malloc(100 * sizeof(float));
    // ...
    free(temp);
}

// ✅ Fast: Preallocate in env struct
typedef struct {
    float temp_buffer[100];
} Env;

void c_step(Env* env) {
    // Use env->temp_buffer
}

Use compiler optimizations

setup.py

extra_compile_args=[
    '-O3',              # Maximum optimization
    '-march=native',    # Use CPU-specific instructions
    '-ffast-math',      # Fast floating point (if safe)
]

Profile your code

Benchmark SPS (steps per second):

import time
import numpy as np

env = MyEnv(num_envs=4096)
env.reset()

actions = np.random.randn(4096, 2).astype(np.float32)

start = time.time()
for _ in range(1000):
    env.step(actions)
elapsed = time.time() - start

sps = (4096 * 1000) / elapsed
print(f"SPS: {sps:,.0f}")

Real-world examples

Simple: Cartpole

See pufferlib/ocean/cartpole/ for a complete simple environment:

Single agent per environment
Continuous actions
Simple physics
~5M SPS

Medium: Snake

See pufferlib/ocean/snake/ for multi-agent:

256+ agents per environment
Grid-based observations
Complex collision detection
~10M SPS

Advanced: Battle

See pufferlib/ocean/battle/ for large-scale multi-agent:

1024+ agents per environment
Continuous actions and observations
Spatial partitioning for efficiency
~1M SPS with complex interactions

CUDA environments

For GPU acceleration, use CUDA instead of C:

myenv.cu

__global__ void step_kernel(Env* envs, int num_envs) {
    int idx = blockIdx.x * blockDim.x + threadIdx.x;
    if (idx < num_envs) {
        // Step logic runs on GPU
        step_single(&envs[idx]);
    }
}

void c_step(Env* env) {
    int threads = 256;
    int blocks = (num_envs + threads - 1) / threads;
    step_kernel<<<blocks, threads>>>(env_array, num_envs);
    cudaDeviceSynchronize();
}

Compile with nvcc:

setup.py

from setuptools import setup, Extension
import numpy as np
from torch.utils.cpp_extension import CUDAExtension, BuildExtension

module = CUDAExtension(
    name='pufferlib.ocean.myenv.binding',
    sources=['pufferlib/ocean/myenv/binding.cu'],
    include_dirs=[np.get_include()],
)

setup(
    name='myenv_cuda',
    ext_modules=[module],
    cmdclass={'build_ext': BuildExtension},
)

Common patterns

Multi-agent environments

For multiple agents per environment, flatten observations:

myenv.py

num_agents_per_env = 256
num_envs = 16
total_agents = num_envs * num_agents_per_env

self.num_agents = total_agents
super().__init__(buf)  # Allocates for total_agents

# Pass slices to each C environment
for i in range(num_envs):
    start = i * num_agents_per_env
    end = start + num_agents_per_env
    env_id = binding.env_init(
        self.observations[start:end],
        self.actions[start:end],
        # ...
    )

Variable-length observations

Pad observations to maximum size:

// Max 10 enemies, but usually fewer
for (int i = 0; i < 10; i++) {
    if (i < num_active_enemies) {
        env->observations[i*2 + 0] = enemies[i].x;
        env->observations[i*2 + 1] = enemies[i].y;
    } else {
        env->observations[i*2 + 0] = 0.0f;  // Padding
        env->observations[i*2 + 1] = 0.0f;
    }
}

Action masking

Return valid actions in info dict:

def step(self, actions):
    # ...
    masks = binding.get_action_masks(self.c_envs)
    info = [{'action_masks': masks}]
    return obs, rewards, terms, truncs, info

Troubleshooting

Segmentation faults

Common causes:

Buffer overflow: Check array indices
Null pointers: Verify initialization
Stale pointers: Don’t cache buffer addresses

Debug with gdb:

gdb python
(gdb) run your_script.py
(gdb) backtrace  # After crash

Performance issues

Profile with perf:

perf record python script.py
perf report

Check Python overhead: Most time should be in C
Verify compiler optimizations: Use -O3
Reduce memory allocations: Preallocate buffers

Build errors

Include paths: Add -I flags in setup.py
Missing libraries: Install raylib, numpy headers
Numpy version: Match numpy version at build and runtime

Next steps

Study existing environments

Browse Ocean environments for reference implementations

Learn architecture

Understand Ocean’s performance optimizations

Getting Started

Core Concepts

Training

Environment Wrappers

Ocean Environments

Advanced

Examples

​Environment structure

Python wrapper

C binding

C implementation

​Step-by-step guide

​Understanding env_binding.h

​Initialization

​Memory management

​Vectorization

​Logging

​Performance optimization tips

​Minimize Python calls

​Use struct-of-arrays for many entities

​Avoid dynamic allocation

​Use compiler optimizations

​Profile your code

​Real-world examples

​Simple: Cartpole

​Medium: Snake

​Advanced: Battle

​CUDA environments

​Common patterns

​Multi-agent environments

​Variable-length observations

​Action masking

​Troubleshooting

​Segmentation faults

​Performance issues

​Build errors

​Next steps

Study existing environments

Learn architecture

Build docs developers (and LLMs) love

Environment structure

Step-by-step guide

Understanding env_binding.h

Initialization

Memory management

Vectorization

Logging

Performance optimization tips

Minimize Python calls

Use struct-of-arrays for many entities

Avoid dynamic allocation

Use compiler optimizations

Profile your code

Real-world examples

Simple: Cartpole

Medium: Snake

Advanced: Battle

CUDA environments

Common patterns

Multi-agent environments

Variable-length observations

Action masking

Troubleshooting

Segmentation faults

Performance issues

Build errors

Next steps