Welcome to PufferLib

PufferLib is a high-performance reinforcement learning framework designed to make working with complex game environments effortless. It combines blazing-fast parallel simulation, a unified compatibility layer, and production-ready training tools to help you go from idea to trained agent in minutes.

Quickstart

Train your first model in under 5 minutes

Core concepts

Understand PufferLib’s architecture and design

Training

Learn about PufferRL and PPO implementation

API reference

Explore the complete API documentation

Why PufferLib?

PufferLib was built to solve real pain points in reinforcement learning research and development:

Blazing fast performance

PufferLib achieves 1M+ environment steps per second through optimized parallel simulation, CUDA extensions, and careful performance engineering. Custom Ocean environments are written in C and compiled to native code for maximum throughput.

Universal compatibility

Stop wrestling with different environment APIs. PufferLib provides a unified interface for:

Gymnasium environments
Legacy Gym environments
PettingZoo multi-agent environments
Custom environments with complex action/observation spaces

Production-ready training

PufferRL includes a battle-tested PPO implementation with:

Distributed training support via PyTorch DDP
LSTM and recurrent policy architectures
Automatic hyperparameter tuning with sweeps
Rich terminal dashboards and experiment tracking
Integration with Weights & Biases and Neptune

Built for complex environments

PufferLib’s emulation system handles:

Variable-length action/observation spaces
Multi-agent scenarios
Procedurally generated environments
Pixel-based and structured observations

Key features

High-performance parallel simulation

Run environments at 1M+ steps/second with optimized vectorization backends:

Serial backend for debugging
Multiprocessing backend for CPU parallelism
Async send/recv API for maximum throughput

Unified environment interface

Work with Gym, Gymnasium, and PettingZoo through a single API:

Automatic space conversion
Consistent reset/step signatures
Unified info dictionary format

PufferRL training system

Built-in PPO trainer with modern RL best practices:

Generalized Advantage Estimation (GAE)
Value function clipping
Action entropy regularization
CUDA-accelerated advantage computation

Ocean environments

Collection of 35+ custom-built environments:

Classic games (Breakout, Asteroids, Snake)
RL benchmarks (Cartpole, Chain MDP)
Multi-agent scenarios (Connect4, Go, Checkers)
All implemented in C for maximum performance

Flexible vectorization

Scale from single-threaded debugging to massive parallelism:

Configurable number of workers and environments
Batched environment execution
Support for both sync and async APIs

Command-line interface

Train, evaluate, and sweep hyperparameters from the terminal:

puffer train puffer_breakout
puffer eval puffer_breakout --policy checkpoints/latest.pt
puffer sweep puffer_breakout --learning-rate 0.001,0.003,0.01

Performance benchmarks

PufferLib achieves industry-leading performance across a range of environments:

Environment	Steps/second	Hardware
Ocean Breakout	1.2M+	RTX 4090 + 32 cores
Atari (vectorized)	800K+	RTX 4090 + 32 cores
Procgen	600K+	RTX 4090 + 32 cores
NetHack	400K+	RTX 4090 + 32 cores

These benchmarks reflect end-to-end training throughput including policy inference, not just environment stepping.

Architecture overview

PufferLib is organized into several key components:

pufferlib/
├── PufferEnv          # Base environment class
├── emulation          # Compatibility layer for Gym/Gymnasium/PettingZoo
├── vector             # Vectorization and parallelization
├── pufferl            # Training system (PPO implementation)
├── models             # Policy architectures (MLP, CNN, LSTM)
├── ocean              # Custom high-performance environments
└── spaces             # Unified space definitions

Who uses PufferLib?

PufferLib powers RL research and applications at:

Academic labs conducting multi-agent research
Game AI companies building intelligent agents
Robotics teams prototyping control policies
Independent researchers exploring new algorithms

Getting started

Ready to dive in? Start with the installation guide to set up your environment, then follow the quickstart tutorial to train your first agent. Have questions? Join the Discord community or check out the GitHub repository.

Getting Started

Core Concepts

Training

Environment Wrappers

Ocean Environments

Advanced

Examples

Introduction

Welcome to PufferLib

Quickstart

Core concepts

Training

API reference

Why PufferLib?

Blazing fast performance

Universal compatibility

Production-ready training

Built for complex environments

Key features

Performance benchmarks

Architecture overview

Who uses PufferLib?

Getting started

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Training

Environment Wrappers

Ocean Environments

Advanced

Examples

​Welcome to PufferLib

Quickstart

Core concepts

Training

API reference

​Why PufferLib?

​Blazing fast performance

​Universal compatibility

​Production-ready training

​Built for complex environments

​Key features

​Performance benchmarks

​Architecture overview

​Who uses PufferLib?

​Getting started

Build docs developers (and LLMs) love

Welcome to PufferLib

Why PufferLib?

Blazing fast performance

Universal compatibility

Production-ready training

Built for complex environments

Key features

Performance benchmarks

Architecture overview

Who uses PufferLib?

Getting started