Skip to main content
PufferLib’s vectorization layer runs multiple environment instances in parallel, dramatically increasing training throughput. The pufferlib.vector.make() function creates vectorized environments with different backends optimized for various use cases.

Vector backends

PufferLib provides three vectorization backends:
# Single-process execution
vecenv = pufferlib.vector.make(
    env_creator,
    num_envs=4,
    backend=pufferlib.vector.Serial
)

Serial backend

The Serial backend runs all environments sequentially in a single process. It’s useful for:
  • Debugging environment implementations
  • Development on machines with limited cores
  • Environments with low computational cost
  • Testing before scaling to multiprocessing
examples/vectorization.py
import pufferlib.vector

serial_vecenv = pufferlib.vector.make(
    SamplePufferEnv,
    num_envs=2,
    backend=pufferlib.vector.Serial
)

observations, infos = serial_vecenv.reset()
actions = serial_vecenv.action_space.sample()
o, r, d, t, i = serial_vecenv.step(actions)

Serial implementation details

The Serial backend creates a list of environment instances and steps them sequentially:
pufferlib/vector.py
class Serial:
    def __init__(self, env_creators, env_args, env_kwargs, num_envs, buf=None, seed=0, **kwargs):
        self.driver_env = env_creators[0](*env_args[0], **env_kwargs[0])
        self.agents_per_batch = self.driver_env.num_agents * num_envs
        
        # Pre-allocate shared buffers
        set_buffers(self, buf)
        
        # Create environments with buffer slices
        self.envs = []
        ptr = 0
        for i in range(num_envs):
            end = ptr + self.driver_env.num_agents
            buf_i = dict(
                observations=self.observations[ptr:end],
                rewards=self.rewards[ptr:end],
                terminals=self.terminals[ptr:end],
                truncations=self.truncations[ptr:end],
                masks=self.masks[ptr:end],
                actions=self.actions[ptr:end]
            )
            env = env_creators[i](*env_args[i], buf=buf_i, **env_kwargs[i])
            self.envs.append(env)
            ptr = end
Even in Serial mode, environments write to shared buffers, making it easy to switch backends without code changes.

Multiprocessing backend

The Multiprocessing backend is the workhorse of PufferLib. It runs environments in parallel across CPU cores with zero-copy shared memory:
vecenv = pufferlib.vector.make(
    env_creator,
    num_envs=128,           # Total number of environments
    num_workers=8,          # Number of worker processes
    batch_size=32,          # Environments per training batch
    backend=pufferlib.vector.Multiprocessing
)

Key parameters

  • num_envs: Total number of environment instances to run
  • num_workers: Number of parallel worker processes (typically = CPU cores)
  • batch_size: Number of environments to collect before returning data
  • zero_copy: Enable zero-copy mode (requires num_envs % batch_size == 0)
  • overwork: Allow num_workers > cpu_cores (disabled by default)

Shared memory architecture

Multiprocessing uses shared memory buffers to avoid data serialization:
pufferlib/vector.py
from multiprocessing import RawArray

self.shm = dict(
    observations=RawArray(obs_ctype, num_agents * int(np.prod(obs_shape))),
    actions=RawArray(atn_ctype, num_agents * int(np.prod(atn_shape))),
    rewards=RawArray('f', num_agents),
    terminals=RawArray('b', num_agents),
    truncateds=RawArray('b', num_agents),
    masks=RawArray('b', num_agents),
    semaphores=RawArray('c', num_workers),
    notify=RawArray('b', num_workers),
)
Worker processes access these buffers directly:
buf = dict(
    observations=np.ndarray((*shape, *obs_shape),
        dtype=obs_dtype, buffer=shm['observations'])[worker_idx],
    rewards=np.ndarray(shape, dtype=np.float32, buffer=shm['rewards'])[worker_idx],
    # ...
)
Shared memory eliminates serialization overhead. Data written by workers is instantly visible to the main process without copying.

Synchronization modes

PufferLib supports three synchronization strategies:

1. Full sync (batch_size = num_envs)

Wait for all workers before returning data:
vecenv = pufferlib.vector.make(
    env_creator,
    num_envs=128,
    num_workers=8,
    batch_size=128,  # Same as num_envs
    backend=pufferlib.vector.Multiprocessing
)
Pros: Predictable timing, easy to reason about Cons: Slowest worker determines throughput

2. Partial sync (zero_copy=True)

Wait for contiguous blocks of workers:
vecenv = pufferlib.vector.make(
    env_creator,
    num_envs=128,
    num_workers=8,
    batch_size=32,
    zero_copy=True,  # Requires num_envs % batch_size == 0
    backend=pufferlib.vector.Multiprocessing
)
Pros: Lower latency than full sync, zero-copy efficiency Cons: Still waits for contiguous worker blocks

3. Full async (zero_copy=False)

Return data from any available workers:
vecenv = pufferlib.vector.make(
    env_creator,
    num_envs=128,
    num_workers=8,
    batch_size=32,
    zero_copy=False,  # Allow non-contiguous workers
    backend=pufferlib.vector.Multiprocessing
)
Pros: Minimum latency, maximum throughput Cons: Small copy overhead for non-contiguous data

Async API

Multiprocessing supports an async API for maximum control:
examples/vectorization.py
vecenv = pufferlib.vector.make(
    SamplePufferEnv,
    num_envs=2,
    num_workers=2,
    batch_size=1,
    backend=pufferlib.vector.Multiprocessing
)

# Async reset
vecenv.async_reset()
o, r, d, t, i, env_ids, masks = vecenv.recv()

# Async step
actions = vecenv.action_space.sample()
vecenv.send(actions)

# Do other work here (e.g., policy inference)
# while environments run in the background

# Get results when ready
o, r, d, t, i, env_ids, masks = vecenv.recv()
The async API returns additional data:
  • env_ids: Which environments produced this batch
  • masks: Which agents are active (for variable-agent environments)

Performance characteristics

Here’s when to use each backend:
Use when:
  • Debugging environment code
  • Environments are very fast (< 0.1ms per step)
  • Single-core machines
  • Development and testing
Performance:
  • No parallelism overhead
  • Easy to profile and debug
  • Linear scaling with num_envs

Passing arguments to environments

You can pass arguments to environment constructors in several ways:

Same arguments for all environments

examples/vectorization.py
vecenv = pufferlib.vector.make(
    SamplePufferEnv,
    num_envs=2,
    backend=pufferlib.vector.Serial,
    env_args=[3],              # Positional args
    env_kwargs={'bar': 4}      # Keyword args
)

Different arguments per environment

examples/vectorization.py
vecenv = pufferlib.vector.make(
    [SamplePufferEnv, SamplePufferEnv],  # List of creators
    num_envs=2,
    backend=pufferlib.vector.Serial,
    env_args=[[3], [4]],                 # Different args per env
    env_kwargs=[{'bar': 4}, {'bar': 5}]  # Different kwargs per env
)

Autotune

PufferLib includes an autotune function to find optimal vectorization parameters:
pufferlib/vector.py
configs = pufferlib.vector.autotune(
    env_creator,
    batch_size=128,
    max_envs=256,
    time_per_test=5
)
Autotune profiles your environment and tests different configurations to find:
  • Optimal num_envs
  • Best num_workers setting
  • Whether zero_copy helps
  • Expected throughput (steps per second)
Run autotune once per environment to determine the best configuration for your hardware. Results vary based on environment complexity and CPU architecture.

Common patterns

Maximizing throughput

import psutil

vecenv = pufferlib.vector.make(
    env_creator,
    num_envs=128,
    num_workers=psutil.cpu_count(logical=False),  # Physical cores
    batch_size=128,
    zero_copy=True,
    backend=pufferlib.vector.Multiprocessing
)

Minimizing latency

vecenv = pufferlib.vector.make(
    env_creator,
    num_envs=32,
    num_workers=8,
    batch_size=8,        # Small batches
    zero_copy=False,     # Full async
    backend=pufferlib.vector.Multiprocessing
)

Development and debugging

vecenv = pufferlib.vector.make(
    env_creator,
    num_envs=1,
    backend=pufferlib.vector.Serial  # Easy to debug
)

Build docs developers (and LLMs) love