Skip to main content

Overview

PufferLib provides multiple vectorization backends to run environments in parallel. The two main classes are Serial (single-process) and Multiprocessing (multi-process).

make()

The recommended way to create vectorized environments:
import pufferlib.vector

vecenv = pufferlib.vector.make(
    env_creator=lambda: MyEnv(),
    num_envs=16,
    num_workers=8,
    batch_size=16,
    backend=pufferlib.vector.Multiprocessing,
)
env_creator
callable
required
Function that creates a single environment instance. Can also be a list of callables (one per env).
env_args
list | list[list]
Positional arguments to pass to env_creator. Can be a single list or a list of lists (one per env).
env_kwargs
dict | list[dict]
Keyword arguments to pass to env_creator. Can be a single dict or a list of dicts (one per env).
num_envs
int
required
Total number of environment instances to create.
num_workers
int | 'auto'
Number of worker processes (Multiprocessing only). Default is num_envs. Set to ‘auto’ for automatic selection.
batch_size
int | 'auto'
Number of agents per batch. Default is num_envs. Must be divisible by (num_envs / num_workers).
backend
type
Vectorization backend class: Serial, Multiprocessing, or Ray. Default is PufferEnv (native single env).
zero_copy
bool
default:"True"
Use zero-copy shared memory (Multiprocessing only). Requires batch_size to divide num_envs evenly.
sync_traj
bool
default:"True"
Synchronize trajectory collection across workers (Multiprocessing only).
overwork
bool
default:"False"
Allow num_workers > CPU cores (Multiprocessing only). Not recommended.
seed
int
default:"0"
Base random seed. Each worker gets seed + worker_id.

Class: Serial

Single-process vectorization. Runs all environments sequentially on one CPU core.

When to use

  • Debugging and development
  • Very fast environments where multiprocessing overhead dominates
  • Platforms without multiprocessing support

Initialization

from pufferlib.vector import Serial

vecenv = Serial(
    env_creators=[lambda: MyEnv() for _ in range(4)],
    env_args=[[] for _ in range(4)],
    env_kwargs=[{} for _ in range(4)],
    num_envs=4,
    seed=42,
)
env_creators
list[callable]
required
List of environment creator functions, one per environment.
env_args
list[list]
required
List of argument lists, one per environment.
env_kwargs
list[dict]
required
List of keyword argument dicts, one per environment.
num_envs
int
required
Number of environments to create.
buf
dict | None
Pre-allocated buffer dictionary (advanced usage).
seed
int
default:"0"
Random seed for environments.

Properties

num_envs
int
Total number of agents across all environments (same as agents_per_batch).
agents_per_batch
int
Number of agents returned per batch.
num_agents
int
Total number of agents (same as agents_per_batch).
single_observation_space
gymnasium.spaces.Space
Observation space for a single agent.
single_action_space
gymnasium.spaces.Space
Action space for a single agent.
observation_space
gymnasium.spaces.Space
Joint observation space for all agents in batch.
action_space
gymnasium.spaces.Space
Joint action space for all agents in batch.
driver_env
PufferEnv
The first environment instance (useful for inspecting properties).
emulated
bool
Whether environments use Gymnasium/PettingZoo emulation.

Methods

reset()

def reset(vecenv, seed=42) -> tuple[numpy.ndarray, dict]
Reset all environments.
seed
int
default:"42"
Random seed.
observations
numpy.ndarray
Initial observations for all agents.
infos
dict
Aggregated info dictionary.

step()

def step(vecenv, actions) -> tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray, numpy.ndarray, dict]
Step all environments.
actions
numpy.ndarray
required
Actions for all agents.
observations
numpy.ndarray
Next observations.
rewards
numpy.ndarray
Rewards.
terminals
numpy.ndarray
Terminal flags.
truncations
numpy.ndarray
Truncation flags.
infos
dict
Aggregated info dictionary.

async_reset()

def async_reset(seed=None)
Asynchronously reset all environments.

send()

def send(actions)
Send actions to environments.

recv()

def recv() -> tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray, numpy.ndarray, dict, numpy.ndarray, numpy.ndarray]
Receive results from environments.

close()

def close()
Close all environments.

Class: Multiprocessing

Multi-process vectorization with optimized shared memory for maximum performance.

When to use

  • Production training with multiple CPU cores
  • CPU-intensive environments
  • Maximum throughput requirements

Initialization

from pufferlib.vector import Multiprocessing

vecenv = Multiprocessing(
    env_creators=[lambda: MyEnv() for _ in range(16)],
    env_args=[[] for _ in range(16)],
    env_kwargs=[{} for _ in range(16)],
    num_envs=16,
    num_workers=8,
    batch_size=16,
    zero_copy=True,
    seed=42,
)
env_creators
list[callable]
required
List of environment creator functions.
env_args
list[list]
required
List of argument lists.
env_kwargs
list[dict]
required
List of keyword argument dicts.
num_envs
int
required
Total number of environments.
num_workers
int
Number of worker processes. Default is num_envs. Must divide num_envs evenly.
batch_size
int
Agents per batch. Default is num_envs.
zero_copy
bool
default:"True"
Use zero-copy shared memory. Requires batch_size to divide num_envs evenly.
sync_traj
bool
default:"True"
Synchronize trajectory collection.
overwork
bool
default:"False"
Allow more workers than CPU cores.
seed
int
default:"0"
Base random seed.

Properties

num_envs
int
Total number of agents (same as agents_per_batch).
num_environments
int
Total number of environment instances.
num_workers
int
Number of worker processes.
num_agents
int
Total number of agents across all environments.
agents_per_batch
int
Number of agents per batch.
envs_per_worker
int
Number of environments per worker process.
workers_per_batch
int
Number of workers per batch.
single_observation_space
gymnasium.spaces.Space
Observation space for a single agent.
single_action_space
gymnasium.spaces.Space
Action space for a single agent.
observation_space
gymnasium.spaces.Space
Joint observation space for batch.
action_space
gymnasium.spaces.Space
Joint action space for batch.
driver_env
PufferEnv
Environment instance for property inspection.
emulated
bool
Whether environments use emulation.

Methods

Same as Serial: reset(), step(), async_reset(), send(), recv(), close(). Additionally:

notify()

def notify()
Notify all worker processes (used for advanced coordination).

Usage examples

Basic usage

import pufferlib.vector

# Create vectorized environments
vecenv = pufferlib.vector.make(
    env_creator=lambda: MyEnv(),
    num_envs=16,
    num_workers=8,
    batch_size=16,
    backend=pufferlib.vector.Multiprocessing,
)

# Reset
obs, infos = vecenv.reset()

# Training loop
for _ in range(1000):
    actions = vecenv.action_space.sample()
    obs, rewards, terminals, truncations, infos = vecenv.step(actions)

vecenv.close()

Async interface

# Reset asynchronously
vecenv.async_reset(seed=42)

for _ in range(1000):
    # Get observations from previous step
    obs, rewards, terminals, truncations, infos, agent_ids, masks = vecenv.recv()
    
    # Compute actions (e.g., with a neural network)
    actions = policy(obs)
    
    # Send actions to environments
    vecenv.send(actions)

vecenv.close()

Different environments per worker

# Create different environments
env_creators = [
    lambda: MyEnv(difficulty=1),
    lambda: MyEnv(difficulty=2),
    lambda: MyEnv(difficulty=3),
    lambda: MyEnv(difficulty=4),
]

vecenv = pufferlib.vector.make(
    env_creator=env_creators,
    num_envs=4,
    backend=pufferlib.vector.Serial,
)

Performance tuning

autotune()

Automatically find optimal vectorization parameters:
from pufferlib.vector import autotune

autotune(
    env_creator=lambda: MyEnv(),
    batch_size=512,
    max_envs=1024,
    time_per_test=5,
)
env_creator
callable
required
Function to create test environments.
batch_size
int
required
Desired batch size.
max_envs
int
default:"194"
Maximum number of environments to test.
model_forward_s
float
default:"0.0"
Simulated model forward pass time.
max_env_ram_gb
float
default:"32"
Maximum RAM for environments.
max_batch_vram_gb
float
default:"0.05"
Maximum VRAM per batch.
time_per_test
int
default:"5"
Seconds to run each test.

Common errors

APIUsageError: num_workers > hardware coresBy default, PufferLib prevents creating more workers than physical CPU cores. Set overwork=True to override (not recommended).
APIUsageError: num_envs must be divisible by batch_sizeWith zero_copy=True, ensure num_envs is evenly divisible by batch_size.
APIUsageError: Call reset before steppingAlways call async_reset() or reset() before using send() or step().

Build docs developers (and LLMs) love