Skip to main content
PufferLib provides various wrappers for Gymnasium and PettingZoo environments to handle common preprocessing tasks and track episode statistics.

PufferEnv

Base class for native PufferLib environments that handle multiple agents with vectorized operations.

Constructor

PufferEnv(buf=None)
buf
dict
default:"None"
Optional buffer dictionary containing pre-allocated arrays. If None, buffers are created automatically.
Subclasses must define single_observation_space, single_action_space, and num_agents before calling super().__init__().

Required attributes

single_observation_space
gymnasium.spaces.Box
Observation space for a single agent (must be Box).
single_action_space
gymnasium.Space
Action space for a single agent (must be Discrete, MultiDiscrete, or Box).
num_agents
int
Number of agents (must be >= 1).

Properties

observation_space
gymnasium.Space
Joint observation space for all agents.
action_space
gymnasium.Space
Joint action space for all agents.
observations
np.ndarray
Buffer for observations, shape (num_agents, *obs_shape).
rewards
np.ndarray
Buffer for rewards, shape (num_agents,), dtype float32.
terminals
np.ndarray
Buffer for terminal flags, shape (num_agents,), dtype bool.
truncations
np.ndarray
Buffer for truncation flags, shape (num_agents,), dtype bool.
masks
np.ndarray
Buffer for agent masks, shape (num_agents,), dtype bool.
actions
np.ndarray
Buffer for actions.
agent_ids
np.ndarray
Array of agent IDs (0 to num_agents-1).
emulated
bool
Always False for native environments.
done
bool
Always False (native envs handle resets internally).

Methods

reset

reset(seed=None) -> tuple[np.ndarray, list]
Resets the environment. Must be implemented by subclasses.
seed
int
default:"None"
Optional random seed.
Returns: Tuple of (observations, infos)

step

step(actions) -> tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray, list]
Steps the environment. Must be implemented by subclasses.
actions
np.ndarray
Actions for all agents.
Returns: Tuple of (observations, rewards, terminals, truncations, infos)

close

close()
Closes the environment. Must be implemented by subclasses.

async_reset

async_reset(seed=None)
Asynchronous reset for compatibility with multiprocessing.

send

send(actions)
Sends actions asynchronously.

recv

recv() -> tuple
Receives results from async step. Returns: Tuple of (observations, rewards, terminals, truncations, infos, agent_ids, masks)

ResizeObservation

Downscales image observations using fast strided indexing.
Do not use gym.wrappers.ResizeObservation - it uses slow OpenCV resize operations.

Constructor

ResizeObservation(env, downscale=2)
env
gymnasium.Env
The environment to wrap.
downscale
int
default:"2"
Downscale factor. Observation dimensions must be divisible by this value.

Example

import gymnasium as gym
import pufferlib

env = gym.make('ALE/Pong-v5')
env = pufferlib.ResizeObservation(env, downscale=2)
# Original: (210, 160) -> Downscaled: (105, 80)

Methods

reset

reset(seed=None, options=None) -> tuple[np.ndarray, dict]
Resets the environment and returns downscaled observation.

step

step(action) -> tuple[np.ndarray, float, bool, bool, dict]
Steps the environment and returns downscaled observation.

ClipAction

Clips continuous actions to valid bounds for Box action spaces.

Constructor

ClipAction(env)
env
gymnasium.Env
Environment with Box action space.
This wrapper expands the action space bounds to dtype limits while clipping actual actions to the original bounds. Useful when your policy might output out-of-bounds values.

Example

import gymnasium as gym
import pufferlib
import numpy as np

env = gym.make('Pendulum-v1')
env = pufferlib.ClipAction(env)

# Actions outside [-2, 2] will be clipped
action = np.array([100.0])  # Will be clipped to [2.0]
obs, reward, term, trunc, info = env.step(action)

Methods

step

step(action) -> tuple[np.ndarray, float, bool, bool, dict]
Clips action to valid bounds and steps the environment.

EpisodeStats

Tracks episodic returns and lengths for single-agent environments.

Constructor

EpisodeStats(env)
env
gymnasium.Env
The environment to wrap.

Behavior

  • Accumulates rewards and step counts during episodes
  • Adds episode_return and episode_length to info on episode end
  • Aggregates nested info values (sums numeric values)

Example

import gymnasium as gym
import pufferlib

env = gym.make('CartPole-v1')
env = pufferlib.EpisodeStats(env)

obs, info = env.reset()
while True:
    obs, reward, term, trunc, info = env.step(env.action_space.sample())
    if term or trunc:
        print(f"Episode return: {info['episode_return']}")
        print(f"Episode length: {info['episode_length']}")
        break

Methods

reset

reset(seed=None, options=None) -> tuple[np.ndarray, dict]
Resets episode statistics and the environment.

step

step(action) -> tuple[np.ndarray, float, bool, bool, dict]
Steps environment and tracks statistics, populating info on episode end.

PettingZooWrapper

Base wrapper for PettingZoo parallel environments with proper attribute access.

Constructor

PettingZooWrapper(env)
env
pettingzoo.ParallelEnv
The PettingZoo parallel environment to wrap.

Methods

This wrapper forwards all methods to the wrapped environment:
  • reset(seed=None, options=None)
  • step(action)
  • observation_space(agent)
  • action_space(agent)
  • observe(agent)
  • state()
  • render()
  • close()

Properties

agents
list
Currently active agents.
possible_agents
list
All possible agents.
unwrapped
pettingzoo.ParallelEnv
The base unwrapped environment.

MeanOverAgents

Averages info values across agents in PettingZoo environments.

Constructor

MeanOverAgents(env)
env
pettingzoo.ParallelEnv
The PettingZoo environment to wrap.

Behavior

Converts per-agent info dicts to a single dict with mean values across agents. Numeric values are averaged; non-numeric values are skipped.

Example

from pettingzoo.butterfly import pistonball_v6
import pufferlib

env = pistonball_v6.parallel_env()
env = pufferlib.MeanOverAgents(env)

obs, infos = env.reset()
# infos is now a single dict with averaged values
# instead of a dict per agent

Methods

reset

reset(seed=None, options=None) -> tuple[dict, dict]
Resets environment and returns averaged infos.

step

step(actions) -> tuple[dict, dict, dict, dict, dict]
Steps environment and returns averaged infos.

MultiagentEpisodeStats

Tracks episodic statistics for each agent in PettingZoo environments.

Constructor

MultiagentEpisodeStats(env)
env
pettingzoo.ParallelEnv
The PettingZoo environment to wrap.

Behavior

  • Tracks episode_return and episode_length per agent
  • Adds statistics to info when each agent terminates or truncates
  • Aggregates nested info values for each agent

Example

from pettingzoo.butterfly import pistonball_v6
import pufferlib

env = pistonball_v6.parallel_env()
env = pufferlib.MultiagentEpisodeStats(env)

obs, infos = env.reset()
while env.agents:
    actions = {agent: env.action_space(agent).sample() for agent in env.agents}
    obs, rewards, terms, truncs, infos = env.step(actions)
    
    for agent in infos:
        if 'episode_return' in infos[agent]:
            print(f"{agent} episode return: {infos[agent]['episode_return']}")

Methods

reset

reset(seed=None, options=None) -> tuple[dict, dict]
Resets per-agent statistics and the environment.

step

step(actions) -> tuple[dict, dict, dict, dict, dict]
Steps environment and tracks per-agent statistics.

GymToGymnasium

Converts old Gym API environments to Gymnasium API.

Constructor

GymToGymnasium(env)
env
gym.Env
Old Gym environment (returns 4-tuple from step).

Behavior

  • Converts 4-tuple (obs, reward, done, info) to 5-tuple (obs, reward, terminated, truncated, info)
  • Sets truncated to always False
  • Wraps reset() to return (obs, {})

Example

import pufferlib

# Old gym environment
old_env = OldGymEnv()
env = pufferlib.GymToGymnasium(old_env)

# Now uses Gymnasium API
obs, info = env.reset()
obs, reward, term, trunc, info = env.step(action)

PettingZooTruncatedWrapper

Adds proper truncation support to PettingZoo environments.

Constructor

PettingZooTruncatedWrapper(env)
env
pettingzoo.ParallelEnv
The PettingZoo environment to wrap.

Behavior

  • Ensures reset returns empty info dicts for all agents
  • Properly forwards truncation flags from step

Methods

reset

reset(seed=None) -> tuple[dict, dict]
Resets environment with empty per-agent infos.

step

step(actions) -> tuple[dict, dict, dict, dict, dict]
Steps environment and returns full 5-tuple.

Utility functions

set_buffers

set_buffers(env, buf=None)
Sets up preallocated buffers for observations, rewards, terminals, truncations, masks, and actions.
env
PufferEnv
Environment to set buffers on.
buf
dict
default:"None"
Optional dictionary with pre-allocated arrays. If None, creates new buffers.

unroll_nested_dict

unroll_nested_dict(d) -> generator
Flattens nested dictionaries with ’/’ separators.
d
dict
Dictionary to flatten.
Yields: Tuples of (flattened_key, value)

Example

import pufferlib

info = {'stats': {'score': 100, 'kills': 5}, 'level': 3}
for key, value in pufferlib.unroll_nested_dict(info):
    print(f"{key}: {value}")
# Output:
# stats/score: 100
# stats/kills: 5
# level: 3

Build docs developers (and LLMs) love