Skip to main content

Welcome to PufferLib

PufferLib is a high-performance reinforcement learning framework designed to make working with complex game environments effortless. It combines blazing-fast parallel simulation, a unified compatibility layer, and production-ready training tools to help you go from idea to trained agent in minutes.

Quickstart

Train your first model in under 5 minutes

Core concepts

Understand PufferLib’s architecture and design

Training

Learn about PufferRL and PPO implementation

API reference

Explore the complete API documentation

Why PufferLib?

PufferLib was built to solve real pain points in reinforcement learning research and development:

Blazing fast performance

PufferLib achieves 1M+ environment steps per second through optimized parallel simulation, CUDA extensions, and careful performance engineering. Custom Ocean environments are written in C and compiled to native code for maximum throughput.

Universal compatibility

Stop wrestling with different environment APIs. PufferLib provides a unified interface for:
  • Gymnasium environments
  • Legacy Gym environments
  • PettingZoo multi-agent environments
  • Custom environments with complex action/observation spaces

Production-ready training

PufferRL includes a battle-tested PPO implementation with:
  • Distributed training support via PyTorch DDP
  • LSTM and recurrent policy architectures
  • Automatic hyperparameter tuning with sweeps
  • Rich terminal dashboards and experiment tracking
  • Integration with Weights & Biases and Neptune

Built for complex environments

PufferLib’s emulation system handles:
  • Variable-length action/observation spaces
  • Multi-agent scenarios
  • Procedurally generated environments
  • Pixel-based and structured observations

Key features

Run environments at 1M+ steps/second with optimized vectorization backends:
  • Serial backend for debugging
  • Multiprocessing backend for CPU parallelism
  • Async send/recv API for maximum throughput
Work with Gym, Gymnasium, and PettingZoo through a single API:
  • Automatic space conversion
  • Consistent reset/step signatures
  • Unified info dictionary format
Built-in PPO trainer with modern RL best practices:
  • Generalized Advantage Estimation (GAE)
  • Value function clipping
  • Action entropy regularization
  • CUDA-accelerated advantage computation
Collection of 35+ custom-built environments:
  • Classic games (Breakout, Asteroids, Snake)
  • RL benchmarks (Cartpole, Chain MDP)
  • Multi-agent scenarios (Connect4, Go, Checkers)
  • All implemented in C for maximum performance
Scale from single-threaded debugging to massive parallelism:
  • Configurable number of workers and environments
  • Batched environment execution
  • Support for both sync and async APIs
Train, evaluate, and sweep hyperparameters from the terminal:
puffer train puffer_breakout
puffer eval puffer_breakout --policy checkpoints/latest.pt
puffer sweep puffer_breakout --learning-rate 0.001,0.003,0.01

Performance benchmarks

PufferLib achieves industry-leading performance across a range of environments:
EnvironmentSteps/secondHardware
Ocean Breakout1.2M+RTX 4090 + 32 cores
Atari (vectorized)800K+RTX 4090 + 32 cores
Procgen600K+RTX 4090 + 32 cores
NetHack400K+RTX 4090 + 32 cores
These benchmarks reflect end-to-end training throughput including policy inference, not just environment stepping.

Architecture overview

PufferLib is organized into several key components:
pufferlib/
├── PufferEnv          # Base environment class
├── emulation          # Compatibility layer for Gym/Gymnasium/PettingZoo
├── vector             # Vectorization and parallelization
├── pufferl            # Training system (PPO implementation)
├── models             # Policy architectures (MLP, CNN, LSTM)
├── ocean              # Custom high-performance environments
└── spaces             # Unified space definitions

Who uses PufferLib?

PufferLib powers RL research and applications at:
  • Academic labs conducting multi-agent research
  • Game AI companies building intelligent agents
  • Robotics teams prototyping control policies
  • Independent researchers exploring new algorithms

Getting started

Ready to dive in? Start with the installation guide to set up your environment, then follow the quickstart tutorial to train your first agent. Have questions? Join the Discord community or check out the GitHub repository.

Build docs developers (and LLMs) love