PettingZoo integration

PufferLib’s PettingZooPufferEnv wrapper converts PettingZoo parallel environments into the PufferEnv interface, handling variable numbers of agents and providing efficient vectorized batching.

Basic usage

Wrap any PettingZoo parallel environment:

from pettingzoo.butterfly import cooperative_pong_v5
import pufferlib.emulation

# Create a PettingZoo parallel environment
env = cooperative_pong_v5.parallel_env()

# Wrap it with PufferLib
env = pufferlib.emulation.PettingZooPufferEnv(env=env)

# Use the vectorized interface
obs, info = env.reset()
actions = env.action_space.sample()
obs, rewards, terminals, truncations, info = env.step(actions)

The wrapper automatically handles multiple agents:

print(obs.shape)      # (num_agents, obs_size)
print(rewards.shape)  # (num_agents,)
print(env.num_agents) # Number of agents in environment

Constructor parameters

The PettingZooPufferEnv wrapper accepts these parameters:

PettingZooPufferEnv(
    env=None,              # PettingZoo ParallelEnv instance
    env_creator=None,      # Or callable that creates environment
    env_args=[],           # Args for env_creator
    env_kwargs={},         # Kwargs for env_creator
    buf=None,              # Pre-allocated buffers (optional)
    seed=0                 # Random seed
)

Using env instance
Using env_creator
With custom buffers

from pettingzoo.butterfly import cooperative_pong_v5
import pufferlib.emulation

env = cooperative_pong_v5.parallel_env()
puffer_env = pufferlib.emulation.PettingZooPufferEnv(env=env)

from pettingzoo.butterfly import cooperative_pong_v5
import pufferlib.emulation

def make_env():
    return cooperative_pong_v5.parallel_env()

puffer_env = pufferlib.emulation.PettingZooPufferEnv(
    env_creator=make_env
)

import numpy as np
from pettingzoo.butterfly import cooperative_pong_v5
import pufferlib.emulation

env = cooperative_pong_v5.parallel_env()
num_agents = len(env.possible_agents)
obs_space = env.observation_space(env.possible_agents[0])

buf = {
    'observations': np.zeros((num_agents, *obs_space.shape)),
    'rewards': np.zeros(num_agents, dtype=np.float32),
    'terminals': np.zeros(num_agents, dtype=bool),
    'truncations': np.zeros(num_agents, dtype=bool),
    'masks': np.ones(num_agents, dtype=bool),
    'actions': np.zeros(num_agents, dtype=np.int32),
}

puffer_env = pufferlib.emulation.PettingZooPufferEnv(env=env, buf=buf)

Agent masking

PettingZoo environments can have variable numbers of active agents. PufferLib handles this with agent masks:

from pettingzoo.butterfly import pistonball_v6
import pufferlib.emulation

env = pistonball_v6.parallel_env()
env = pufferlib.emulation.PettingZooPufferEnv(env=env)

obs, info = env.reset()

# Check which agents are active
print(env.masks)  # Boolean array: [True, True, True, ...]

# Some agents may finish early
actions = env.action_space.sample()
obs, rewards, terminals, truncations, info = env.step(actions)

# Masks indicate which agents are still active
active_agents = env.masks.sum()
print(f"{active_agents} agents remaining")

Inactive agents have their observations zeroed, terminals set to True, and masks set to False.

Space emulation

Like GymnasiumPufferEnv, PettingZooPufferEnv automatically emulates complex observation and action spaces.

Observation space emulation

All agents must have the same observation and action space structure. The wrapper emulates them for neural network compatibility:

import gymnasium
import pettingzoo
import pufferlib.emulation

class MyMultiAgentEnv(pettingzoo.ParallelEnv):
    def __init__(self):
        self.possible_agents = ['agent_0', 'agent_1', 'agent_2']
        self.agents = self.possible_agents[:]
    
    def observation_space(self, agent):
        return gymnasium.spaces.Dict({
            'image': gymnasium.spaces.Box(0, 255, (32, 32, 3), dtype=np.uint8),
            'vector': gymnasium.spaces.Box(-1, 1, (5,), dtype=np.float32)
        })
    
    def action_space(self, agent):
        return gymnasium.spaces.Discrete(4)

env = MyMultiAgentEnv()
puffer_env = pufferlib.emulation.PettingZooPufferEnv(env=env)

# Dict observation flattened for all agents
print(puffer_env.single_observation_space)
# Box(0, 255, (3077,), uint8)
print(puffer_env.is_obs_emulated)
# True

Action space emulation

import gymnasium
import pettingzoo
import pufferlib.emulation

class MyMultiAgentEnv(pettingzoo.ParallelEnv):
    def __init__(self):
        self.possible_agents = ['agent_0', 'agent_1']
        self.agents = self.possible_agents[:]
    
    def observation_space(self, agent):
        return gymnasium.spaces.Box(0, 1, (4,))
    
    def action_space(self, agent):
        # Complex action space
        return gymnasium.spaces.Tuple((
            gymnasium.spaces.Discrete(3),
            gymnasium.spaces.Discrete(2),
        ))

env = MyMultiAgentEnv()
puffer_env = pufferlib.emulation.PettingZooPufferEnv(env=env)

# Action space converted to MultiDiscrete
print(puffer_env.single_action_space)
# MultiDiscrete([3 2])

# Sample actions for all agents
actions = puffer_env.action_space.sample()
print(actions.shape)  # (2, 2) - (num_agents, action_dims)

Butterfly example

Butterfly environments are cooperative PettingZoo games. From pufferlib/environments/butterfly/environment.py:

from pettingzoo.utils.conversions import aec_to_parallel_wrapper
import pufferlib.emulation

def make(name='cooperative_pong_v5', buf=None):
    if name == 'cooperative_pong_v5':
        from pettingzoo.butterfly import cooperative_pong_v5 as pong
        env_cls = pong.raw_env
    elif name == 'knights_archers_zombies_v10':
        from pettingzoo.butterfly import knights_archers_zombies_v10 as kaz
        env_cls = kaz.raw_env
    else:
        raise ValueError(f'Unknown environment: {name}')
    
    # Convert AEC to parallel
    env = env_cls()
    env = aec_to_parallel_wrapper(env)
    
    return pufferlib.emulation.PettingZooPufferEnv(env=env, buf=buf)

Usage:

from pufferlib.environments import butterfly

env = butterfly.make('cooperative_pong_v5')
obs, info = env.reset()

for _ in range(100):
    # All agents act simultaneously
    actions = env.action_space.sample()
    obs, rewards, terminals, truncations, info = env.step(actions)
    
    if all(terminals) or all(truncations):
        print(f"Episode ended")
        obs, info = env.reset()

MAgent example

MAgent provides large-scale battle simulations. From pufferlib/environments/magent/environment.py:

from pettingzoo.utils.conversions import aec_to_parallel_wrapper
import pufferlib
import pufferlib.emulation

def make(name='battle_v4', buf=None):
    from pettingzoo.magent import battle_v4
    
    env = battle_v4.env()
    env = aec_to_parallel_wrapper(env)
    
    # PettingZoo changed truncation behavior
    env = pufferlib.PettingZooTruncatedWrapper(env)
    
    return pufferlib.emulation.PettingZooPufferEnv(env=env, buf=buf)

Usage:

from pufferlib.environments import magent

env = magent.make('battle_v4')
print(f"Agents: {env.num_agents}")  # Many agents!

obs, info = env.reset()
for _ in range(1000):
    actions = env.action_space.sample()
    obs, rewards, terminals, truncations, info = env.step(actions)
    
    # Check how many agents are still active
    active = env.masks.sum()
    print(f"Active agents: {active}/{env.num_agents}")
    
    if env.done:
        obs, info = env.reset()

Neural MMO example

Neural MMO is a massively multi-agent survival game. From pufferlib/environments/nmmo/environment.py:

import pufferlib
import pufferlib.emulation
import nmmo

class NMMOWrapper(pufferlib.PettingZooWrapper):
    """Remove task info spam"""
    def step(self, actions):
        obs, rewards, dones, truncateds, infos = self.env.step(actions)
        # Simplify info dict
        infos = {k: list(v['task'].values())[0] for k, v in infos.items()}
        return obs, rewards, dones, truncateds, infos

def make(name='nmmo', buf=None, **kwargs):
    env = nmmo.Env(**kwargs)
    env = NMMOWrapper(env)
    env = pufferlib.MultiagentEpisodeStats(env)
    env = pufferlib.MeanOverAgents(env)
    return pufferlib.emulation.PettingZooPufferEnv(env=env, buf=buf)

Usage:

from pufferlib.environments import nmmo

# Create environment with config
env = nmmo.make(
    'nmmo',
    num_agents=128,
    horizon=1024
)

obs, info = env.reset()
for _ in range(1000):
    actions = env.action_space.sample()
    obs, rewards, terminals, truncations, info = env.step(actions)

Converting AEC to parallel

Many PettingZoo environments use the AEC (Agent-Environment-Cycle) API. Convert them to parallel:

from pettingzoo.classic import rps_v2
from pettingzoo.utils.conversions import aec_to_parallel_wrapper
import pufferlib.emulation

# Create AEC environment
aec_env = rps_v2.env()

# Convert to parallel
parallel_env = aec_to_parallel_wrapper(aec_env)

# Wrap with PufferLib
puffer_env = pufferlib.emulation.PettingZooPufferEnv(env=parallel_env)

Multi-agent wrappers

PufferLib provides wrappers specifically for multi-agent environments:

MultiagentEpisodeStats

Tracks episode statistics per agent:

from pettingzoo.butterfly import cooperative_pong_v5
import pufferlib
import pufferlib.emulation

env = cooperative_pong_v5.parallel_env()
env = pufferlib.MultiagentEpisodeStats(env)
env = pufferlib.emulation.PettingZooPufferEnv(env=env)

obs, info = env.reset()
for _ in range(100):
    actions = env.action_space.sample()
    obs, rewards, terminals, truncations, info = env.step(actions)
    
    # Info contains per-agent statistics
    if any(terminals):
        for agent, agent_info in info.items():
            if 'episode_return' in agent_info:
                print(f"{agent}: {agent_info['episode_return']}")

MeanOverAgents

Averages info dictionaries across all agents:

from pettingzoo.butterfly import cooperative_pong_v5
import pufferlib
import pufferlib.emulation

env = cooperative_pong_v5.parallel_env()
env = pufferlib.MultiagentEpisodeStats(env)
env = pufferlib.MeanOverAgents(env)  # Average stats
env = pufferlib.emulation.PettingZooPufferEnv(env=env)

obs, info = env.reset()
for _ in range(100):
    actions = env.action_space.sample()
    obs, rewards, terminals, truncations, info = env.step(actions)
    
    # Info now contains mean values
    if any(terminals):
        print(f"Mean return: {info.get('episode_return', 0)}")

API reference

PettingZooPufferEnv

See pufferlib/emulation.py:244 for the full implementation.

Attributes

single_observation_space - Observation space for one agent
single_action_space - Action space for one agent
observation_space - Batched observation space (shape: (num_agents, *obs_shape))
action_space - Batched action space (shape: (num_agents, *atn_shape))
num_agents - Number of possible agents
possible_agents - List of all possible agent IDs
agents - List of currently active agent IDs
is_obs_emulated - Whether observations are emulated
is_atn_emulated - Whether actions are emulated
observations - NumPy array buffer (shape: (num_agents, *obs_shape))
rewards - NumPy array buffer (shape: (num_agents,))
terminals - NumPy array buffer (shape: (num_agents,))
truncations - NumPy array buffer (shape: (num_agents,))
masks - Boolean mask of active agents (shape: (num_agents,))

Methods

reset(seed=None) - Reset environment, returns (dict_obs, info)
step(actions) - Step environment with dict or array of actions, returns (dict_obs, rewards, terminals, truncations, info)
observation_space(agent) - Get observation space for specific agent
action_space(agent) - Get action space for specific agent
render() - Render environment
close() - Close environment

Properties

done - True if all agents are done or environment is finished
render_mode - Render mode from wrapped environment

Action format

Actions can be passed as either NumPy arrays or dictionaries:

# As NumPy array (recommended)
actions = env.action_space.sample()  # Shape: (num_agents, *action_shape)
obs, rewards, terminals, truncations, info = env.step(actions)

# As dictionary
actions = {
    agent: env.action_space(agent).sample() 
    for agent in env.possible_agents
}
obs, rewards, terminals, truncations, info = env.step(actions)

The array format is more efficient for vectorized training.

Next steps

Custom wrappers

Create your own environment wrappers

Gymnasium integration

Learn about single-agent wrappers

Getting Started

Core Concepts

Training

Environment Wrappers

Ocean Environments

Advanced

Examples

PettingZoo integration

Basic usage

Constructor parameters

Agent masking

Space emulation

Observation space emulation

Action space emulation

Butterfly example

MAgent example

Neural MMO example

Converting AEC to parallel

Multi-agent wrappers

MultiagentEpisodeStats

MeanOverAgents

API reference

PettingZooPufferEnv

Attributes

Methods

Properties

Action format

Next steps

Custom wrappers

Gymnasium integration

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Training

Environment Wrappers

Ocean Environments

Advanced

Examples

​Basic usage

​Constructor parameters

​Agent masking

​Space emulation

​Observation space emulation

​Action space emulation

​Butterfly example

​MAgent example

​Neural MMO example

​Converting AEC to parallel

​Multi-agent wrappers

​MultiagentEpisodeStats

​MeanOverAgents

​API reference

​PettingZooPufferEnv

​Attributes

​Methods

​Properties

​Action format

​Next steps

Custom wrappers

Gymnasium integration

Build docs developers (and LLMs) love

Basic usage

Constructor parameters

Agent masking

Space emulation

Observation space emulation

Action space emulation

Butterfly example

MAgent example

Neural MMO example

Converting AEC to parallel

Multi-agent wrappers

MultiagentEpisodeStats

MeanOverAgents

API reference

PettingZooPufferEnv

Attributes

Methods

Properties

Action format

Next steps