Skip to main content
PufferLib’s PettingZooPufferEnv wrapper converts PettingZoo parallel environments into the PufferEnv interface, handling variable numbers of agents and providing efficient vectorized batching.

Basic usage

Wrap any PettingZoo parallel environment:
from pettingzoo.butterfly import cooperative_pong_v5
import pufferlib.emulation

# Create a PettingZoo parallel environment
env = cooperative_pong_v5.parallel_env()

# Wrap it with PufferLib
env = pufferlib.emulation.PettingZooPufferEnv(env=env)

# Use the vectorized interface
obs, info = env.reset()
actions = env.action_space.sample()
obs, rewards, terminals, truncations, info = env.step(actions)
The wrapper automatically handles multiple agents:
print(obs.shape)      # (num_agents, obs_size)
print(rewards.shape)  # (num_agents,)
print(env.num_agents) # Number of agents in environment

Constructor parameters

The PettingZooPufferEnv wrapper accepts these parameters:
PettingZooPufferEnv(
    env=None,              # PettingZoo ParallelEnv instance
    env_creator=None,      # Or callable that creates environment
    env_args=[],           # Args for env_creator
    env_kwargs={},         # Kwargs for env_creator
    buf=None,              # Pre-allocated buffers (optional)
    seed=0                 # Random seed
)
from pettingzoo.butterfly import cooperative_pong_v5
import pufferlib.emulation

env = cooperative_pong_v5.parallel_env()
puffer_env = pufferlib.emulation.PettingZooPufferEnv(env=env)

Agent masking

PettingZoo environments can have variable numbers of active agents. PufferLib handles this with agent masks:
from pettingzoo.butterfly import pistonball_v6
import pufferlib.emulation

env = pistonball_v6.parallel_env()
env = pufferlib.emulation.PettingZooPufferEnv(env=env)

obs, info = env.reset()

# Check which agents are active
print(env.masks)  # Boolean array: [True, True, True, ...]

# Some agents may finish early
actions = env.action_space.sample()
obs, rewards, terminals, truncations, info = env.step(actions)

# Masks indicate which agents are still active
active_agents = env.masks.sum()
print(f"{active_agents} agents remaining")
Inactive agents have their observations zeroed, terminals set to True, and masks set to False.

Space emulation

Like GymnasiumPufferEnv, PettingZooPufferEnv automatically emulates complex observation and action spaces.

Observation space emulation

All agents must have the same observation and action space structure. The wrapper emulates them for neural network compatibility:
import gymnasium
import pettingzoo
import pufferlib.emulation

class MyMultiAgentEnv(pettingzoo.ParallelEnv):
    def __init__(self):
        self.possible_agents = ['agent_0', 'agent_1', 'agent_2']
        self.agents = self.possible_agents[:]
    
    def observation_space(self, agent):
        return gymnasium.spaces.Dict({
            'image': gymnasium.spaces.Box(0, 255, (32, 32, 3), dtype=np.uint8),
            'vector': gymnasium.spaces.Box(-1, 1, (5,), dtype=np.float32)
        })
    
    def action_space(self, agent):
        return gymnasium.spaces.Discrete(4)

env = MyMultiAgentEnv()
puffer_env = pufferlib.emulation.PettingZooPufferEnv(env=env)

# Dict observation flattened for all agents
print(puffer_env.single_observation_space)
# Box(0, 255, (3077,), uint8)
print(puffer_env.is_obs_emulated)
# True

Action space emulation

import gymnasium
import pettingzoo
import pufferlib.emulation

class MyMultiAgentEnv(pettingzoo.ParallelEnv):
    def __init__(self):
        self.possible_agents = ['agent_0', 'agent_1']
        self.agents = self.possible_agents[:]
    
    def observation_space(self, agent):
        return gymnasium.spaces.Box(0, 1, (4,))
    
    def action_space(self, agent):
        # Complex action space
        return gymnasium.spaces.Tuple((
            gymnasium.spaces.Discrete(3),
            gymnasium.spaces.Discrete(2),
        ))

env = MyMultiAgentEnv()
puffer_env = pufferlib.emulation.PettingZooPufferEnv(env=env)

# Action space converted to MultiDiscrete
print(puffer_env.single_action_space)
# MultiDiscrete([3 2])

# Sample actions for all agents
actions = puffer_env.action_space.sample()
print(actions.shape)  # (2, 2) - (num_agents, action_dims)

Butterfly example

Butterfly environments are cooperative PettingZoo games. From pufferlib/environments/butterfly/environment.py:
from pettingzoo.utils.conversions import aec_to_parallel_wrapper
import pufferlib.emulation

def make(name='cooperative_pong_v5', buf=None):
    if name == 'cooperative_pong_v5':
        from pettingzoo.butterfly import cooperative_pong_v5 as pong
        env_cls = pong.raw_env
    elif name == 'knights_archers_zombies_v10':
        from pettingzoo.butterfly import knights_archers_zombies_v10 as kaz
        env_cls = kaz.raw_env
    else:
        raise ValueError(f'Unknown environment: {name}')
    
    # Convert AEC to parallel
    env = env_cls()
    env = aec_to_parallel_wrapper(env)
    
    return pufferlib.emulation.PettingZooPufferEnv(env=env, buf=buf)
Usage:
from pufferlib.environments import butterfly

env = butterfly.make('cooperative_pong_v5')
obs, info = env.reset()

for _ in range(100):
    # All agents act simultaneously
    actions = env.action_space.sample()
    obs, rewards, terminals, truncations, info = env.step(actions)
    
    if all(terminals) or all(truncations):
        print(f"Episode ended")
        obs, info = env.reset()

MAgent example

MAgent provides large-scale battle simulations. From pufferlib/environments/magent/environment.py:
from pettingzoo.utils.conversions import aec_to_parallel_wrapper
import pufferlib
import pufferlib.emulation

def make(name='battle_v4', buf=None):
    from pettingzoo.magent import battle_v4
    
    env = battle_v4.env()
    env = aec_to_parallel_wrapper(env)
    
    # PettingZoo changed truncation behavior
    env = pufferlib.PettingZooTruncatedWrapper(env)
    
    return pufferlib.emulation.PettingZooPufferEnv(env=env, buf=buf)
Usage:
from pufferlib.environments import magent

env = magent.make('battle_v4')
print(f"Agents: {env.num_agents}")  # Many agents!

obs, info = env.reset()
for _ in range(1000):
    actions = env.action_space.sample()
    obs, rewards, terminals, truncations, info = env.step(actions)
    
    # Check how many agents are still active
    active = env.masks.sum()
    print(f"Active agents: {active}/{env.num_agents}")
    
    if env.done:
        obs, info = env.reset()

Neural MMO example

Neural MMO is a massively multi-agent survival game. From pufferlib/environments/nmmo/environment.py:
import pufferlib
import pufferlib.emulation
import nmmo

class NMMOWrapper(pufferlib.PettingZooWrapper):
    """Remove task info spam"""
    def step(self, actions):
        obs, rewards, dones, truncateds, infos = self.env.step(actions)
        # Simplify info dict
        infos = {k: list(v['task'].values())[0] for k, v in infos.items()}
        return obs, rewards, dones, truncateds, infos

def make(name='nmmo', buf=None, **kwargs):
    env = nmmo.Env(**kwargs)
    env = NMMOWrapper(env)
    env = pufferlib.MultiagentEpisodeStats(env)
    env = pufferlib.MeanOverAgents(env)
    return pufferlib.emulation.PettingZooPufferEnv(env=env, buf=buf)
Usage:
from pufferlib.environments import nmmo

# Create environment with config
env = nmmo.make(
    'nmmo',
    num_agents=128,
    horizon=1024
)

obs, info = env.reset()
for _ in range(1000):
    actions = env.action_space.sample()
    obs, rewards, terminals, truncations, info = env.step(actions)

Converting AEC to parallel

Many PettingZoo environments use the AEC (Agent-Environment-Cycle) API. Convert them to parallel:
from pettingzoo.classic import rps_v2
from pettingzoo.utils.conversions import aec_to_parallel_wrapper
import pufferlib.emulation

# Create AEC environment
aec_env = rps_v2.env()

# Convert to parallel
parallel_env = aec_to_parallel_wrapper(aec_env)

# Wrap with PufferLib
puffer_env = pufferlib.emulation.PettingZooPufferEnv(env=parallel_env)

Multi-agent wrappers

PufferLib provides wrappers specifically for multi-agent environments:

MultiagentEpisodeStats

Tracks episode statistics per agent:
from pettingzoo.butterfly import cooperative_pong_v5
import pufferlib
import pufferlib.emulation

env = cooperative_pong_v5.parallel_env()
env = pufferlib.MultiagentEpisodeStats(env)
env = pufferlib.emulation.PettingZooPufferEnv(env=env)

obs, info = env.reset()
for _ in range(100):
    actions = env.action_space.sample()
    obs, rewards, terminals, truncations, info = env.step(actions)
    
    # Info contains per-agent statistics
    if any(terminals):
        for agent, agent_info in info.items():
            if 'episode_return' in agent_info:
                print(f"{agent}: {agent_info['episode_return']}")

MeanOverAgents

Averages info dictionaries across all agents:
from pettingzoo.butterfly import cooperative_pong_v5
import pufferlib
import pufferlib.emulation

env = cooperative_pong_v5.parallel_env()
env = pufferlib.MultiagentEpisodeStats(env)
env = pufferlib.MeanOverAgents(env)  # Average stats
env = pufferlib.emulation.PettingZooPufferEnv(env=env)

obs, info = env.reset()
for _ in range(100):
    actions = env.action_space.sample()
    obs, rewards, terminals, truncations, info = env.step(actions)
    
    # Info now contains mean values
    if any(terminals):
        print(f"Mean return: {info.get('episode_return', 0)}")

API reference

PettingZooPufferEnv

See pufferlib/emulation.py:244 for the full implementation.

Attributes

  • single_observation_space - Observation space for one agent
  • single_action_space - Action space for one agent
  • observation_space - Batched observation space (shape: (num_agents, *obs_shape))
  • action_space - Batched action space (shape: (num_agents, *atn_shape))
  • num_agents - Number of possible agents
  • possible_agents - List of all possible agent IDs
  • agents - List of currently active agent IDs
  • is_obs_emulated - Whether observations are emulated
  • is_atn_emulated - Whether actions are emulated
  • observations - NumPy array buffer (shape: (num_agents, *obs_shape))
  • rewards - NumPy array buffer (shape: (num_agents,))
  • terminals - NumPy array buffer (shape: (num_agents,))
  • truncations - NumPy array buffer (shape: (num_agents,))
  • masks - Boolean mask of active agents (shape: (num_agents,))

Methods

  • reset(seed=None) - Reset environment, returns (dict_obs, info)
  • step(actions) - Step environment with dict or array of actions, returns (dict_obs, rewards, terminals, truncations, info)
  • observation_space(agent) - Get observation space for specific agent
  • action_space(agent) - Get action space for specific agent
  • render() - Render environment
  • close() - Close environment

Properties

  • done - True if all agents are done or environment is finished
  • render_mode - Render mode from wrapped environment

Action format

Actions can be passed as either NumPy arrays or dictionaries:
# As NumPy array (recommended)
actions = env.action_space.sample()  # Shape: (num_agents, *action_shape)
obs, rewards, terminals, truncations, info = env.step(actions)

# As dictionary
actions = {
    agent: env.action_space(agent).sample() 
    for agent in env.possible_agents
}
obs, rewards, terminals, truncations, info = env.step(actions)
The array format is more efficient for vectorized training.

Next steps

Custom wrappers

Create your own environment wrappers

Gymnasium integration

Learn about single-agent wrappers

Build docs developers (and LLMs) love