Converting Gymnasium and PettingZoo environments to PufferEnv format
PufferLib’s emulation layer allows you to use any Gymnasium or PettingZoo environment with PufferLib’s vectorization and training systems. The emulation wrappers convert standard environment APIs to the PufferEnv interface while handling complex observation and action spaces.
Simple action spaces (Discrete, MultiDiscrete, or Box)
Pre-allocated shared memory buffers
Many existing environments use:
Dict or Tuple observation spaces
Complex nested action spaces
Dynamic allocation patterns
Emulation bridges this gap by:
Converting complex spaces to flat arrays
Managing serialization and deserialization
Providing a compatible interface for vectorization
Emulation has some overhead compared to native PufferEnvs, but it’s often negligible compared to environment step time. Use native PufferEnvs when you need maximum performance.
puffer_env.num_agents # Always 1 for Gymnasiumpuffer_env.single_observation_spacepuffer_env.single_action_spacepuffer_env.observation_space # Same as single_observation_spacepuffer_env.action_space # Same as single_action_spacepuffer_env.emulated # Dict with emulation metadatapuffer_env.done # True if environment is done
PettingZooPufferEnv manages the mapping between agent names and array indices:
pufferlib/emulation.py
class PettingZooPufferEnv: def __init__(self, ...): # Compute spaces from first agent single_agent = self.possible_agents[0] self.env_single_observation_space = self.env.observation_space(single_agent) self.env_single_action_space = self.env.action_space(single_agent) # Number of agents self.num_agents = len(self.possible_agents) def reset(self, seed=None): obs, info = self.env.reset(seed=seed) # Map agent observations to array indices for i, agent in enumerate(self.possible_agents): if agent in obs: self.observations[i] = obs[agent]
PettingZoo environments can have agents that die or join during an episode. PufferLib handles this with masks:
pufferlib/emulation.py
def step(self, actions): obs, rewards, dones, truncateds, infos = self.env.step(unpacked_actions) for i, agent in enumerate(self.possible_agents): if agent not in obs: # Agent is dead/inactive self.observations[i] = 0 self.rewards[i] = 0 self.terminals[i] = True self.truncations[i] = False self.masks[i] = False # Mark as inactive continue # Agent is active self.observations[i] = obs[agent] self.rewards[i] = rewards[agent] self.terminals[i] = dones[agent] self.truncations[i] = truncateds[agent] self.masks[i] = True # Mark as active
The mask array indicates which agents are currently active:
# After steppingactive_agents = puffer_env.masks.sum() # Count active agentsactive_rewards = puffer_env.rewards[puffer_env.masks] # Get rewards for active agents only
puffer_env.num_agents # Number of possible agentspuffer_env.possible_agents # List of all possible agent namespuffer_env.agents # Currently active agentspuffer_env.done # True if all agents are donepuffer_env.masks # Boolean array of active agents