Vectorization

Overview

PufferLib provides multiple vectorization backends to run environments in parallel. The two main classes are Serial (single-process) and Multiprocessing (multi-process).

make()

The recommended way to create vectorized environments:

import pufferlib.vector

vecenv = pufferlib.vector.make(
    env_creator=lambda: MyEnv(),
    num_envs=16,
    num_workers=8,
    batch_size=16,
    backend=pufferlib.vector.Multiprocessing,
)

env_creator

callable

required

Function that creates a single environment instance. Can also be a list of callables (one per env).

env_args

list | list[list]

Positional arguments to pass to env_creator. Can be a single list or a list of lists (one per env).

env_kwargs

dict | list[dict]

Keyword arguments to pass to env_creator. Can be a single dict or a list of dicts (one per env).

num_envs

int

required

Total number of environment instances to create.

num_workers

int | 'auto'

Number of worker processes (Multiprocessing only). Default is num_envs. Set to ‘auto’ for automatic selection.

batch_size

int | 'auto'

Number of agents per batch. Default is num_envs. Must be divisible by (num_envs / num_workers).

backend

type

Vectorization backend class: Serial, Multiprocessing, or Ray. Default is PufferEnv (native single env).

zero_copy

bool

default:"True"

Use zero-copy shared memory (Multiprocessing only). Requires batch_size to divide num_envs evenly.

sync_traj

bool

default:"True"

Synchronize trajectory collection across workers (Multiprocessing only).

overwork

bool

default:"False"

Allow num_workers > CPU cores (Multiprocessing only). Not recommended.

seed

int

default:"0"

Base random seed. Each worker gets seed + worker_id.

Class: Serial

Single-process vectorization. Runs all environments sequentially on one CPU core.

When to use

Debugging and development
Very fast environments where multiprocessing overhead dominates
Platforms without multiprocessing support

Initialization

from pufferlib.vector import Serial

vecenv = Serial(
    env_creators=[lambda: MyEnv() for _ in range(4)],
    env_args=[[] for _ in range(4)],
    env_kwargs=[{} for _ in range(4)],
    num_envs=4,
    seed=42,
)

env_creators

list[callable]

required

List of environment creator functions, one per environment.

env_args

list[list]

required

List of argument lists, one per environment.

env_kwargs

list[dict]

required

List of keyword argument dicts, one per environment.

num_envs

int

required

Number of environments to create.

buf

dict | None

Pre-allocated buffer dictionary (advanced usage).

seed

int

default:"0"

Random seed for environments.

Properties

num_envs

int

Total number of agents across all environments (same as agents_per_batch).

agents_per_batch

int

Number of agents returned per batch.

num_agents

int

Total number of agents (same as agents_per_batch).

single_observation_space

gymnasium.spaces.Space

Observation space for a single agent.

single_action_space

gymnasium.spaces.Space

Action space for a single agent.

observation_space

gymnasium.spaces.Space

Joint observation space for all agents in batch.

action_space

gymnasium.spaces.Space

Joint action space for all agents in batch.

driver_env

PufferEnv

The first environment instance (useful for inspecting properties).

emulated

bool

Whether environments use Gymnasium/PettingZoo emulation.

Methods

reset()

def reset(vecenv, seed=42) -> tuple[numpy.ndarray, dict]

Reset all environments.

seed

int

default:"42"

Random seed.

observations

numpy.ndarray

Initial observations for all agents.

infos

dict

Aggregated info dictionary.

step()

def step(vecenv, actions) -> tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray, numpy.ndarray, dict]

Step all environments.

actions

numpy.ndarray

required

Actions for all agents.

observations

numpy.ndarray

Next observations.

rewards

numpy.ndarray

Rewards.

terminals

numpy.ndarray

Terminal flags.

truncations

numpy.ndarray

Truncation flags.

infos

dict

Aggregated info dictionary.

async_reset()

def async_reset(seed=None)

Asynchronously reset all environments.

send()

def send(actions)

Send actions to environments.

recv()

def recv() -> tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray, numpy.ndarray, dict, numpy.ndarray, numpy.ndarray]

Receive results from environments.

close()

def close()

Close all environments.

Class: Multiprocessing

Multi-process vectorization with optimized shared memory for maximum performance.

When to use

Production training with multiple CPU cores
CPU-intensive environments
Maximum throughput requirements

Initialization

from pufferlib.vector import Multiprocessing

vecenv = Multiprocessing(
    env_creators=[lambda: MyEnv() for _ in range(16)],
    env_args=[[] for _ in range(16)],
    env_kwargs=[{} for _ in range(16)],
    num_envs=16,
    num_workers=8,
    batch_size=16,
    zero_copy=True,
    seed=42,
)

env_creators

list[callable]

required

List of environment creator functions.

env_args

list[list]

required

List of argument lists.

env_kwargs

list[dict]

required

List of keyword argument dicts.

num_envs

int

required

Total number of environments.

num_workers

int

Number of worker processes. Default is num_envs. Must divide num_envs evenly.

batch_size

int

Agents per batch. Default is num_envs.

zero_copy

bool

default:"True"

Use zero-copy shared memory. Requires batch_size to divide num_envs evenly.

sync_traj

bool

default:"True"

Synchronize trajectory collection.

overwork

bool

default:"False"

Allow more workers than CPU cores.

seed

int

default:"0"

Base random seed.

Properties

num_envs

int

Total number of agents (same as agents_per_batch).

num_environments

int

Total number of environment instances.

num_workers

int

Number of worker processes.

num_agents

int

Total number of agents across all environments.

agents_per_batch

int

Number of agents per batch.

envs_per_worker

int

Number of environments per worker process.

workers_per_batch

int

Number of workers per batch.

single_observation_space

gymnasium.spaces.Space

Observation space for a single agent.

single_action_space

gymnasium.spaces.Space

Action space for a single agent.

observation_space

gymnasium.spaces.Space

Joint observation space for batch.

action_space

gymnasium.spaces.Space

Joint action space for batch.

driver_env

PufferEnv

Environment instance for property inspection.

emulated

bool

Whether environments use emulation.

Methods

Same as Serial: reset(), step(), async_reset(), send(), recv(), close(). Additionally:

notify()

def notify()

Notify all worker processes (used for advanced coordination).

Usage examples

Basic usage

import pufferlib.vector

# Create vectorized environments
vecenv = pufferlib.vector.make(
    env_creator=lambda: MyEnv(),
    num_envs=16,
    num_workers=8,
    batch_size=16,
    backend=pufferlib.vector.Multiprocessing,
)

# Reset
obs, infos = vecenv.reset()

# Training loop
for _ in range(1000):
    actions = vecenv.action_space.sample()
    obs, rewards, terminals, truncations, infos = vecenv.step(actions)

vecenv.close()

Async interface

# Reset asynchronously
vecenv.async_reset(seed=42)

for _ in range(1000):
    # Get observations from previous step
    obs, rewards, terminals, truncations, infos, agent_ids, masks = vecenv.recv()
    
    # Compute actions (e.g., with a neural network)
    actions = policy(obs)
    
    # Send actions to environments
    vecenv.send(actions)

vecenv.close()

Different environments per worker

# Create different environments
env_creators = [
    lambda: MyEnv(difficulty=1),
    lambda: MyEnv(difficulty=2),
    lambda: MyEnv(difficulty=3),
    lambda: MyEnv(difficulty=4),
]

vecenv = pufferlib.vector.make(
    env_creator=env_creators,
    num_envs=4,
    backend=pufferlib.vector.Serial,
)

Performance tuning

autotune()

Automatically find optimal vectorization parameters:

from pufferlib.vector import autotune

autotune(
    env_creator=lambda: MyEnv(),
    batch_size=512,
    max_envs=1024,
    time_per_test=5,
)

env_creator

callable

required

Function to create test environments.

batch_size

int

required

Desired batch size.

max_envs

int

default:"194"

Maximum number of environments to test.

model_forward_s

float

default:"0.0"

Simulated model forward pass time.

max_env_ram_gb

float

default:"32"

Maximum RAM for environments.

max_batch_vram_gb

float

default:"0.05"

Maximum VRAM per batch.

time_per_test

int

default:"5"

Seconds to run each test.

Common errors

APIUsageError: num_workers > hardware coresBy default, PufferLib prevents creating more workers than physical CPU cores. Set overwork=True to override (not recommended).

APIUsageError: num_envs must be divisible by batch_sizeWith zero_copy=True, ensure num_envs is evenly divisible by batch_size.

APIUsageError: Call reset before steppingAlways call async_reset() or reset() before using send() or step().

Core API

Training

Emulation

Utilities

Overview

make()

Class: Serial

When to use

Initialization

Properties

Methods

reset()

step()

async_reset()

send()

recv()

close()

Class: Multiprocessing

When to use

Initialization

Properties

Methods

notify()

Usage examples

Basic usage

Async interface

Different environments per worker

Performance tuning

autotune()

Common errors

Build docs developers (and LLMs) love

Core API

Training

Emulation

Utilities

​Overview

​make()

​Class: Serial

​When to use

​Initialization

​Properties

​Methods

​reset()

​step()

​async_reset()

​send()

​recv()

​close()

​Class: Multiprocessing

​When to use

​Initialization

​Properties

​Methods

​notify()

​Usage examples

​Basic usage

​Async interface

​Different environments per worker

​Performance tuning

​autotune()

​Common errors

Build docs developers (and LLMs) love

Overview

make()

Class: Serial

When to use

Initialization

Properties

Methods

reset()

step()

async_reset()

send()

recv()

close()

Class: Multiprocessing

When to use

Initialization

Properties

Methods

notify()

Usage examples

Basic usage

Async interface

Different environments per worker

Performance tuning

autotune()

Common errors