Overview
PufferEnv is the base class for creating native vectorized environments in PufferLib. Unlike traditional Gym/Gymnasium environments that operate on single agents, PufferEnv handles multiple agents simultaneously for maximum performance.Class: PufferEnv
Required attributes
Before callingsuper().__init__(), your environment must define:
The observation space for a single agent. Must be a Box space.
single_action_space
pufferlib.spaces.Discrete | pufferlib.spaces.MultiDiscrete | pufferlib.spaces.Box
required
The action space for a single agent. Must be Discrete, MultiDiscrete, or Box.
Number of agents in the environment. Must be >= 1.
Initialization
Optional pre-allocated buffer dictionary containing numpy arrays for observations, rewards, terminals, truncations, masks, and actions. Used internally for zero-copy vectorization.
Properties
After initialization, PufferEnv provides:Array of shape
(num_agents, *obs_shape) containing current observationsArray of shape
(num_agents,) containing rewardsBoolean array of shape
(num_agents,) indicating terminal statesBoolean array of shape
(num_agents,) indicating truncated episodesBoolean array of shape
(num_agents,) indicating active agentsArray of shape
(num_agents, *action_shape) for storing actionsJoint observation space for all agents (automatically created from single_observation_space)
Joint action space for all agents (automatically created from single_action_space)
Array of agent IDs (0 to num_agents-1)
Always False for native environments. Indicates whether the environment uses emulation.
Always False for native environments. Native envs handle resets internally.
Returns self. Used for compatibility with Multiprocessing.
Methods
reset()
Random seed for reproducibility
Initial observations (written to self.observations)
List of info dicts, one per agent
You must implement this method in your subclass.
step()
Actions for all agents, shape
(num_agents,) or (num_agents, *action_shape)Next observations (written to self.observations)
Rewards for each agent (written to self.rewards)
Terminal flags for each agent (written to self.terminals)
Truncation flags for each agent (written to self.truncations)
List of info dicts, one per agent
You must implement this method in your subclass.
close()
You must implement this method in your subclass.
Async interface
PufferEnv provides an async-style interface for advanced vectorization:async_reset()
reset() internally and stores infos.
Random seed for reproducibility
send()
Actions for all agents
recv()
Current observations
Current rewards
Terminal flags
Truncation flags
Info dictionaries
Agent IDs
Active agent masks