The PufferEnv base class is the foundation of PufferLib’s environment system. It provides a standardized interface optimized for high-performance vectorized execution.
PufferEnv interface
All PufferLib environments inherit from PufferEnv and implement a vector-based interface:
class PufferEnv :
def __init__ ( self , buf = None ):
# Must define before calling super().__init__()
# self.single_observation_space
# self.single_action_space
# self.num_agents
def reset ( self , seed = None ):
# Reset environment and return observations, infos
raise NotImplementedError
def step ( self , actions ):
# Execute actions and return (obs, rewards, terminals, truncations, infos)
raise NotImplementedError
def close ( self ):
# Clean up resources
raise NotImplementedError
Required attributes
Before calling super().__init__(), you must define three attributes:
Single observation space
Single action space
Number of agents
self .single_observation_space = gymnasium.spaces.Box(
low = 0 , high = 255 , shape = ( 84 , 84 ), dtype = np.uint8
)
PufferEnvs use single_observation_space and single_action_space for one agent. The base class automatically creates observation_space and action_space for all agents combined.
Space requirements
PufferLib enforces specific space types for native environments:
Observations : Must be Box spaces (continuous arrays)
Actions : Can be Discrete, MultiDiscrete, or Box spaces
This ensures efficient memory layout and vectorization. If you need complex observation spaces, use the emulation layer instead.
Creating a native PufferEnv
Here’s a minimal example of a native PufferEnv:
import gymnasium
import pufferlib
class SamplePufferEnv ( pufferlib . PufferEnv ):
def __init__ ( self , buf = None , seed = 0 ):
# Define spaces BEFORE calling super()
self .single_observation_space = gymnasium.spaces.Box(
low =- 1 , high = 1 , shape = ( 1 ,)
)
self .single_action_space = gymnasium.spaces.Discrete( 2 )
self .num_agents = 2
# This creates shared buffers and joint spaces
super (). __init__ (buf)
def reset ( self , seed = 0 ):
# Write directly to pre-allocated buffer
self .observations[:] = self .observation_space.sample()
return self .observations, []
def step ( self , action ):
# Update observations in-place
self .observations[:] = self .observation_space.sample()
infos = [{ 'info' : 'list of dictionaries' }]
return self .observations, self .rewards, self .terminals, self .truncations, infos
def close ( self ):
pass
Using the environment
# Create and use the environment
env = SamplePufferEnv()
observations, infos = env.reset()
# Actions for all agents
actions = env.action_space.sample()
observations, rewards, terminals, truncations, infos = env.step(actions)
print ( 'Observations shape:' , observations.shape) # (2, 1)
print ( 'Rewards shape:' , rewards.shape) # (2,)
print ( 'Terminals shape:' , terminals.shape) # (2,)
Shared memory buffers
The buf parameter enables zero-copy vectorization. When provided, PufferEnv uses pre-allocated shared memory instead of creating new arrays:
def set_buffers ( env , buf = None ):
if buf is None :
# Allocate new buffers
obs_space = env.single_observation_space
env.observations = np.zeros((env.num_agents, * obs_space.shape), dtype = obs_space.dtype)
env.rewards = np.zeros(env.num_agents, dtype = np.float32)
env.terminals = np.zeros(env.num_agents, dtype = bool )
env.truncations = np.zeros(env.num_agents, dtype = bool )
env.masks = np.ones(env.num_agents, dtype = bool )
# ... action buffer
else :
# Use shared buffers from vectorization
env.observations = buf[ 'observations' ]
env.rewards = buf[ 'rewards' ]
env.terminals = buf[ 'terminals' ]
env.truncations = buf[ 'truncations' ]
env.masks = buf[ 'masks' ]
env.actions = buf[ 'actions' ]
You don’t need to worry about the buf parameter when creating standalone environments. It’s automatically handled by the vectorization layer.
Multi-agent environments
PufferEnv naturally supports multi-agent environments. Simply set num_agents to the number of agents:
class MultiAgentEnv ( pufferlib . PufferEnv ):
def __init__ ( self , num_agents = 16 , buf = None , seed = 0 ):
self .single_observation_space = gymnasium.spaces.Box(
low = 0 , high = 1 , shape = ( 4 ,)
)
self .single_action_space = gymnasium.spaces.Discrete( 3 )
self .num_agents = num_agents
super (). __init__ (buf)
def step ( self , actions ):
# actions is a numpy array of shape (num_agents,)
for i, action in enumerate (actions):
# Process each agent's action
self .observations[i] = self .compute_observation(i)
self .rewards[i] = self .compute_reward(i)
self .terminals[i] = self .check_done(i)
return self .observations, self .rewards, self .terminals, self .truncations, []
Advanced example: Snake
The Ocean Snake environment demonstrates a high-performance native PufferEnv with C++ bindings:
pufferlib/ocean/snake/snake.py
class Snake ( pufferlib . PufferEnv ):
def __init__ ( self , num_envs = 16 , width = 640 , height = 360 ,
num_snakes = 256 , num_food = 4096 , vision = 5 ,
buf = None , seed = 0 ):
# Define observation space based on vision radius
self .single_observation_space = gymnasium.spaces.Box(
low = 0 , high = 2 , shape = ( 2 * vision + 1 , 2 * vision + 1 ), dtype = np.int8
)
self .single_action_space = gymnasium.spaces.Discrete( 4 )
self .num_agents = num_envs * num_snakes
super (). __init__ (buf)
# Initialize C++ backend that writes directly to buffers
self .c_envs = self ._init_c_backend(
obs = self .observations,
actions = self .actions,
rewards = self .rewards,
terminals = self .terminals,
truncations = self .truncations,
width = width,
height = height,
num_snakes = num_snakes
)
def step ( self , actions ):
# C++ backend updates all buffers in-place
self .actions[:] = actions
binding.vec_step( self .c_envs)
return ( self .observations, self .rewards,
self .terminals, self .truncations, [])
This environment achieves over 1M steps per second by:
Using compiled C++ for core logic
Writing directly to shared memory buffers
Vectorizing all operations across agents
Avoiding Python overhead in the inner loop
Return value conventions
reset()
Returns a tuple of (observations, infos):
observations: NumPy array of shape (num_agents, *obs_shape)
infos: List of info dictionaries (can be empty list)
step()
Returns a tuple of (observations, rewards, terminals, truncations, infos):
observations: NumPy array of shape (num_agents, *obs_shape)
rewards: NumPy array of shape (num_agents,) with dtype float32
terminals: NumPy array of shape (num_agents,) with dtype bool
truncations: NumPy array of shape (num_agents,) with dtype bool
infos: List of info dictionaries
Infos must be a list of dictionaries, not a single dictionary. This allows each agent or environment to have its own info.
Properties
PufferEnv provides several useful properties:
@ property
def emulated ( self ):
'''Native envs do not use emulation'''
return False
@ property
def done ( self ):
'''Native envs handle resets internally'''
return False
@ property
def driver_env ( self ):
'''For compatibility with Multiprocessing'''
return self
These properties help the vectorization layer determine how to handle the environment.
Best practices
Pre-allocate everything : Avoid allocations in step() for maximum performance
Write in-place : Always write to self.observations, self.rewards, etc. rather than creating new arrays
Use appropriate dtypes : Match the dtypes from your space definitions
Handle resets internally : Native PufferEnvs typically auto-reset agents when they terminate
Return info as list : Even for single-agent environments, wrap info in a list
# Good: In-place update
def step ( self , actions ):
self .observations[:] = compute_observations()
return self .observations, self .rewards, self .terminals, self .truncations, []
# Bad: Creates new array
def step ( self , actions ):
self .observations = compute_observations() # Don't do this!
return self .observations, self .rewards, self .terminals, self .truncations, []