Wrappers API - PufferLib

PufferLib provides various wrappers for Gymnasium and PettingZoo environments to handle common preprocessing tasks and track episode statistics.

PufferEnv

Base class for native PufferLib environments that handle multiple agents with vectorized operations.

Constructor

PufferEnv(buf=None)

buf

dict

default:"None"

Optional buffer dictionary containing pre-allocated arrays. If None, buffers are created automatically.

Subclasses must define single_observation_space, single_action_space, and num_agents before calling super().__init__().

Required attributes

single_observation_space

gymnasium.spaces.Box

Observation space for a single agent (must be Box).

single_action_space

gymnasium.Space

Action space for a single agent (must be Discrete, MultiDiscrete, or Box).

num_agents

int

Number of agents (must be >= 1).

Properties

observation_space

gymnasium.Space

Joint observation space for all agents.

action_space

gymnasium.Space

Joint action space for all agents.

observations

np.ndarray

Buffer for observations, shape (num_agents, *obs_shape).

rewards

np.ndarray

Buffer for rewards, shape (num_agents,), dtype float32.

terminals

np.ndarray

Buffer for terminal flags, shape (num_agents,), dtype bool.

truncations

np.ndarray

Buffer for truncation flags, shape (num_agents,), dtype bool.

masks

np.ndarray

Buffer for agent masks, shape (num_agents,), dtype bool.

actions

np.ndarray

Buffer for actions.

agent_ids

np.ndarray

Array of agent IDs (0 to num_agents-1).

emulated

bool

Always False for native environments.

done

bool

Always False (native envs handle resets internally).

Methods

reset

reset(seed=None) -> tuple[np.ndarray, list]

Resets the environment. Must be implemented by subclasses.

seed

int

default:"None"

Optional random seed.

Returns: Tuple of (observations, infos)

step

step(actions) -> tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray, list]

Steps the environment. Must be implemented by subclasses.

actions

np.ndarray

Actions for all agents.

Returns: Tuple of (observations, rewards, terminals, truncations, infos)

close

close()

Closes the environment. Must be implemented by subclasses.

async_reset

async_reset(seed=None)

Asynchronous reset for compatibility with multiprocessing.

send

send(actions)

Sends actions asynchronously.

recv

recv() -> tuple

Receives results from async step. Returns: Tuple of (observations, rewards, terminals, truncations, infos, agent_ids, masks)

ResizeObservation

Downscales image observations using fast strided indexing.

Do not use gym.wrappers.ResizeObservation - it uses slow OpenCV resize operations.

Constructor

ResizeObservation(env, downscale=2)

env

gymnasium.Env

The environment to wrap.

downscale

int

default:"2"

Downscale factor. Observation dimensions must be divisible by this value.

Example

import gymnasium as gym
import pufferlib

env = gym.make('ALE/Pong-v5')
env = pufferlib.ResizeObservation(env, downscale=2)
# Original: (210, 160) -> Downscaled: (105, 80)

Methods

reset

reset(seed=None, options=None) -> tuple[np.ndarray, dict]

Resets the environment and returns downscaled observation.

step

step(action) -> tuple[np.ndarray, float, bool, bool, dict]

Steps the environment and returns downscaled observation.

ClipAction

Clips continuous actions to valid bounds for Box action spaces.

Constructor

ClipAction(env)

env

gymnasium.Env

Environment with Box action space.

This wrapper expands the action space bounds to dtype limits while clipping actual actions to the original bounds. Useful when your policy might output out-of-bounds values.

Example

import gymnasium as gym
import pufferlib
import numpy as np

env = gym.make('Pendulum-v1')
env = pufferlib.ClipAction(env)

# Actions outside [-2, 2] will be clipped
action = np.array([100.0])  # Will be clipped to [2.0]
obs, reward, term, trunc, info = env.step(action)

Methods

step

step(action) -> tuple[np.ndarray, float, bool, bool, dict]

Clips action to valid bounds and steps the environment.

EpisodeStats

Tracks episodic returns and lengths for single-agent environments.

Constructor

EpisodeStats(env)

env

gymnasium.Env

The environment to wrap.

Behavior

Accumulates rewards and step counts during episodes
Adds episode_return and episode_length to info on episode end
Aggregates nested info values (sums numeric values)

Example

import gymnasium as gym
import pufferlib

env = gym.make('CartPole-v1')
env = pufferlib.EpisodeStats(env)

obs, info = env.reset()
while True:
    obs, reward, term, trunc, info = env.step(env.action_space.sample())
    if term or trunc:
        print(f"Episode return: {info['episode_return']}")
        print(f"Episode length: {info['episode_length']}")
        break

Methods

reset

reset(seed=None, options=None) -> tuple[np.ndarray, dict]

Resets episode statistics and the environment.

step

step(action) -> tuple[np.ndarray, float, bool, bool, dict]

Steps environment and tracks statistics, populating info on episode end.

PettingZooWrapper

Base wrapper for PettingZoo parallel environments with proper attribute access.

Constructor

PettingZooWrapper(env)

env

pettingzoo.ParallelEnv

The PettingZoo parallel environment to wrap.

Methods

This wrapper forwards all methods to the wrapped environment:

reset(seed=None, options=None)
step(action)
observation_space(agent)
action_space(agent)
observe(agent)
state()
render()
close()

Properties

agents

list

Currently active agents.

possible_agents

list

All possible agents.

unwrapped

pettingzoo.ParallelEnv

The base unwrapped environment.

MeanOverAgents

Averages info values across agents in PettingZoo environments.

Constructor

MeanOverAgents(env)

env

pettingzoo.ParallelEnv

The PettingZoo environment to wrap.

Behavior

Converts per-agent info dicts to a single dict with mean values across agents. Numeric values are averaged; non-numeric values are skipped.

Example

from pettingzoo.butterfly import pistonball_v6
import pufferlib

env = pistonball_v6.parallel_env()
env = pufferlib.MeanOverAgents(env)

obs, infos = env.reset()
# infos is now a single dict with averaged values
# instead of a dict per agent

Methods

reset

reset(seed=None, options=None) -> tuple[dict, dict]

Resets environment and returns averaged infos.

step

step(actions) -> tuple[dict, dict, dict, dict, dict]

Steps environment and returns averaged infos.

MultiagentEpisodeStats

Tracks episodic statistics for each agent in PettingZoo environments.

Constructor

MultiagentEpisodeStats(env)

env

pettingzoo.ParallelEnv

The PettingZoo environment to wrap.

Behavior

Tracks episode_return and episode_length per agent
Adds statistics to info when each agent terminates or truncates
Aggregates nested info values for each agent

Example

from pettingzoo.butterfly import pistonball_v6
import pufferlib

env = pistonball_v6.parallel_env()
env = pufferlib.MultiagentEpisodeStats(env)

obs, infos = env.reset()
while env.agents:
    actions = {agent: env.action_space(agent).sample() for agent in env.agents}
    obs, rewards, terms, truncs, infos = env.step(actions)
    
    for agent in infos:
        if 'episode_return' in infos[agent]:
            print(f"{agent} episode return: {infos[agent]['episode_return']}")

Methods

reset

reset(seed=None, options=None) -> tuple[dict, dict]

Resets per-agent statistics and the environment.

step

step(actions) -> tuple[dict, dict, dict, dict, dict]

Steps environment and tracks per-agent statistics.

GymToGymnasium

Converts old Gym API environments to Gymnasium API.

Constructor

GymToGymnasium(env)

env

gym.Env

Old Gym environment (returns 4-tuple from step).

Behavior

Converts 4-tuple (obs, reward, done, info) to 5-tuple (obs, reward, terminated, truncated, info)
Sets truncated to always False
Wraps reset() to return (obs, {})

Example

import pufferlib

# Old gym environment
old_env = OldGymEnv()
env = pufferlib.GymToGymnasium(old_env)

# Now uses Gymnasium API
obs, info = env.reset()
obs, reward, term, trunc, info = env.step(action)

PettingZooTruncatedWrapper

Adds proper truncation support to PettingZoo environments.

Constructor

PettingZooTruncatedWrapper(env)

env

pettingzoo.ParallelEnv

The PettingZoo environment to wrap.

Behavior

Ensures reset returns empty info dicts for all agents
Properly forwards truncation flags from step

Methods

reset

reset(seed=None) -> tuple[dict, dict]

Resets environment with empty per-agent infos.

step

step(actions) -> tuple[dict, dict, dict, dict, dict]

Steps environment and returns full 5-tuple.

Utility functions

set_buffers

set_buffers(env, buf=None)

Sets up preallocated buffers for observations, rewards, terminals, truncations, masks, and actions.

env

PufferEnv

Environment to set buffers on.

buf

dict

default:"None"

Optional dictionary with pre-allocated arrays. If None, creates new buffers.

unroll_nested_dict

unroll_nested_dict(d) -> generator

Flattens nested dictionaries with ’/’ separators.

dict

Dictionary to flatten.

Yields: Tuples of (flattened_key, value)

Example

import pufferlib

info = {'stats': {'score': 100, 'kills': 5}, 'level': 3}
for key, value in pufferlib.unroll_nested_dict(info):
    print(f"{key}: {value}")
# Output:
# stats/score: 100
# stats/kills: 5
# level: 3

Core API

Training

Emulation

Utilities

​PufferEnv

​Constructor

​Required attributes

​Properties

​Methods

​reset

​step

​close

​async_reset

​send

​recv

​ResizeObservation

​Constructor

​Example

​Methods

​reset

​step

​ClipAction

​Constructor

​Example

​Methods

​step

​EpisodeStats

​Constructor

​Behavior

​Example

​Methods

​reset

​step

​PettingZooWrapper

​Constructor

​Methods

​Properties

​MeanOverAgents

​Constructor

​Behavior

​Example

​Methods

​reset

​step

​MultiagentEpisodeStats

​Constructor

​Behavior

​Example

​Methods

​reset

​step

​GymToGymnasium

​Constructor

​Behavior

​Example

​PettingZooTruncatedWrapper

​Constructor

​Behavior

​Methods

​reset

​step

​Utility functions

​set_buffers

​unroll_nested_dict

​Example

Build docs developers (and LLMs) love

PufferEnv

Constructor

Required attributes

Properties

Methods

reset

step

close

async_reset

send

recv

ResizeObservation

Constructor

Example

Methods

reset

step

ClipAction

Constructor

Example

Methods

step

EpisodeStats

Constructor

Behavior

Example

Methods

reset

step

PettingZooWrapper

Constructor

Methods

Properties

MeanOverAgents

Constructor

Behavior

Example

Methods

reset

step

MultiagentEpisodeStats

Constructor

Behavior

Example

Methods

reset

step

GymToGymnasium

Constructor

Behavior

Example

PettingZooTruncatedWrapper

Constructor

Behavior

Methods

reset

step

Utility functions

set_buffers

unroll_nested_dict

Example