Skip to main content
The PufferEnv base class is the foundation of PufferLib’s environment system. It provides a standardized interface optimized for high-performance vectorized execution.

PufferEnv interface

All PufferLib environments inherit from PufferEnv and implement a vector-based interface:
pufferlib/pufferlib.py
class PufferEnv:
    def __init__(self, buf=None):
        # Must define before calling super().__init__()
        # self.single_observation_space
        # self.single_action_space
        # self.num_agents
        
    def reset(self, seed=None):
        # Reset environment and return observations, infos
        raise NotImplementedError
        
    def step(self, actions):
        # Execute actions and return (obs, rewards, terminals, truncations, infos)
        raise NotImplementedError
        
    def close(self):
        # Clean up resources
        raise NotImplementedError

Required attributes

Before calling super().__init__(), you must define three attributes:
self.single_observation_space = gymnasium.spaces.Box(
    low=0, high=255, shape=(84, 84), dtype=np.uint8
)
PufferEnvs use single_observation_space and single_action_space for one agent. The base class automatically creates observation_space and action_space for all agents combined.

Space requirements

PufferLib enforces specific space types for native environments:
  • Observations: Must be Box spaces (continuous arrays)
  • Actions: Can be Discrete, MultiDiscrete, or Box spaces
This ensures efficient memory layout and vectorization. If you need complex observation spaces, use the emulation layer instead.

Creating a native PufferEnv

Here’s a minimal example of a native PufferEnv:
examples/puffer_env.py
import gymnasium
import pufferlib

class SamplePufferEnv(pufferlib.PufferEnv):
    def __init__(self, buf=None, seed=0):
        # Define spaces BEFORE calling super()
        self.single_observation_space = gymnasium.spaces.Box(
            low=-1, high=1, shape=(1,)
        )
        self.single_action_space = gymnasium.spaces.Discrete(2)
        self.num_agents = 2
        
        # This creates shared buffers and joint spaces
        super().__init__(buf)

    def reset(self, seed=0):
        # Write directly to pre-allocated buffer
        self.observations[:] = self.observation_space.sample()
        return self.observations, []

    def step(self, action):
        # Update observations in-place
        self.observations[:] = self.observation_space.sample()
        infos = [{'info': 'list of dictionaries'}]
        return self.observations, self.rewards, self.terminals, self.truncations, infos
        
    def close(self):
        pass

Using the environment

# Create and use the environment
env = SamplePufferEnv()
observations, infos = env.reset()

# Actions for all agents
actions = env.action_space.sample()
observations, rewards, terminals, truncations, infos = env.step(actions)

print('Observations shape:', observations.shape)  # (2, 1)
print('Rewards shape:', rewards.shape)  # (2,)
print('Terminals shape:', terminals.shape)  # (2,)

Shared memory buffers

The buf parameter enables zero-copy vectorization. When provided, PufferEnv uses pre-allocated shared memory instead of creating new arrays:
pufferlib/pufferlib.py
def set_buffers(env, buf=None):
    if buf is None:
        # Allocate new buffers
        obs_space = env.single_observation_space
        env.observations = np.zeros((env.num_agents, *obs_space.shape), dtype=obs_space.dtype)
        env.rewards = np.zeros(env.num_agents, dtype=np.float32)
        env.terminals = np.zeros(env.num_agents, dtype=bool)
        env.truncations = np.zeros(env.num_agents, dtype=bool)
        env.masks = np.ones(env.num_agents, dtype=bool)
        # ... action buffer
    else:
        # Use shared buffers from vectorization
        env.observations = buf['observations']
        env.rewards = buf['rewards']
        env.terminals = buf['terminals']
        env.truncations = buf['truncations']
        env.masks = buf['masks']
        env.actions = buf['actions']
You don’t need to worry about the buf parameter when creating standalone environments. It’s automatically handled by the vectorization layer.

Multi-agent environments

PufferEnv naturally supports multi-agent environments. Simply set num_agents to the number of agents:
class MultiAgentEnv(pufferlib.PufferEnv):
    def __init__(self, num_agents=16, buf=None, seed=0):
        self.single_observation_space = gymnasium.spaces.Box(
            low=0, high=1, shape=(4,)
        )
        self.single_action_space = gymnasium.spaces.Discrete(3)
        self.num_agents = num_agents
        super().__init__(buf)
        
    def step(self, actions):
        # actions is a numpy array of shape (num_agents,)
        for i, action in enumerate(actions):
            # Process each agent's action
            self.observations[i] = self.compute_observation(i)
            self.rewards[i] = self.compute_reward(i)
            self.terminals[i] = self.check_done(i)
            
        return self.observations, self.rewards, self.terminals, self.truncations, []

Advanced example: Snake

The Ocean Snake environment demonstrates a high-performance native PufferEnv with C++ bindings:
pufferlib/ocean/snake/snake.py
class Snake(pufferlib.PufferEnv):
    def __init__(self, num_envs=16, width=640, height=360,
            num_snakes=256, num_food=4096, vision=5,
            buf=None, seed=0):
        
        # Define observation space based on vision radius
        self.single_observation_space = gymnasium.spaces.Box(
            low=0, high=2, shape=(2*vision+1, 2*vision+1), dtype=np.int8
        )
        self.single_action_space = gymnasium.spaces.Discrete(4)
        self.num_agents = num_envs * num_snakes
        
        super().__init__(buf)
        
        # Initialize C++ backend that writes directly to buffers
        self.c_envs = self._init_c_backend(
            obs=self.observations,
            actions=self.actions,
            rewards=self.rewards,
            terminals=self.terminals,
            truncations=self.truncations,
            width=width,
            height=height,
            num_snakes=num_snakes
        )
 
    def step(self, actions):
        # C++ backend updates all buffers in-place
        self.actions[:] = actions
        binding.vec_step(self.c_envs)
        
        return (self.observations, self.rewards,
                self.terminals, self.truncations, [])
This environment achieves over 1M steps per second by:
  • Using compiled C++ for core logic
  • Writing directly to shared memory buffers
  • Vectorizing all operations across agents
  • Avoiding Python overhead in the inner loop

Return value conventions

reset()

Returns a tuple of (observations, infos):
  • observations: NumPy array of shape (num_agents, *obs_shape)
  • infos: List of info dictionaries (can be empty list)

step()

Returns a tuple of (observations, rewards, terminals, truncations, infos):
  • observations: NumPy array of shape (num_agents, *obs_shape)
  • rewards: NumPy array of shape (num_agents,) with dtype float32
  • terminals: NumPy array of shape (num_agents,) with dtype bool
  • truncations: NumPy array of shape (num_agents,) with dtype bool
  • infos: List of info dictionaries
Infos must be a list of dictionaries, not a single dictionary. This allows each agent or environment to have its own info.

Properties

PufferEnv provides several useful properties:
pufferlib/pufferlib.py
@property
def emulated(self):
    '''Native envs do not use emulation'''
    return False

@property
def done(self):
    '''Native envs handle resets internally'''
    return False

@property
def driver_env(self):
    '''For compatibility with Multiprocessing'''
    return self
These properties help the vectorization layer determine how to handle the environment.

Best practices

  1. Pre-allocate everything: Avoid allocations in step() for maximum performance
  2. Write in-place: Always write to self.observations, self.rewards, etc. rather than creating new arrays
  3. Use appropriate dtypes: Match the dtypes from your space definitions
  4. Handle resets internally: Native PufferEnvs typically auto-reset agents when they terminate
  5. Return info as list: Even for single-agent environments, wrap info in a list
# Good: In-place update
def step(self, actions):
    self.observations[:] = compute_observations()
    return self.observations, self.rewards, self.terminals, self.truncations, []

# Bad: Creates new array
def step(self, actions):
    self.observations = compute_observations()  # Don't do this!
    return self.observations, self.rewards, self.terminals, self.truncations, []

Build docs developers (and LLMs) love