Welcome to PufferLib
PufferLib is a high-performance reinforcement learning framework designed to make working with complex game environments effortless. It combines blazing-fast parallel simulation, a unified compatibility layer, and production-ready training tools to help you go from idea to trained agent in minutes.Quickstart
Train your first model in under 5 minutes
Core concepts
Understand PufferLib’s architecture and design
Training
Learn about PufferRL and PPO implementation
API reference
Explore the complete API documentation
Why PufferLib?
PufferLib was built to solve real pain points in reinforcement learning research and development:Blazing fast performance
PufferLib achieves 1M+ environment steps per second through optimized parallel simulation, CUDA extensions, and careful performance engineering. Custom Ocean environments are written in C and compiled to native code for maximum throughput.Universal compatibility
Stop wrestling with different environment APIs. PufferLib provides a unified interface for:- Gymnasium environments
- Legacy Gym environments
- PettingZoo multi-agent environments
- Custom environments with complex action/observation spaces
Production-ready training
PufferRL includes a battle-tested PPO implementation with:- Distributed training support via PyTorch DDP
- LSTM and recurrent policy architectures
- Automatic hyperparameter tuning with sweeps
- Rich terminal dashboards and experiment tracking
- Integration with Weights & Biases and Neptune
Built for complex environments
PufferLib’s emulation system handles:- Variable-length action/observation spaces
- Multi-agent scenarios
- Procedurally generated environments
- Pixel-based and structured observations
Key features
High-performance parallel simulation
High-performance parallel simulation
Run environments at 1M+ steps/second with optimized vectorization backends:
- Serial backend for debugging
- Multiprocessing backend for CPU parallelism
- Async send/recv API for maximum throughput
Unified environment interface
Unified environment interface
Work with Gym, Gymnasium, and PettingZoo through a single API:
- Automatic space conversion
- Consistent reset/step signatures
- Unified info dictionary format
PufferRL training system
PufferRL training system
Built-in PPO trainer with modern RL best practices:
- Generalized Advantage Estimation (GAE)
- Value function clipping
- Action entropy regularization
- CUDA-accelerated advantage computation
Ocean environments
Ocean environments
Collection of 35+ custom-built environments:
- Classic games (Breakout, Asteroids, Snake)
- RL benchmarks (Cartpole, Chain MDP)
- Multi-agent scenarios (Connect4, Go, Checkers)
- All implemented in C for maximum performance
Flexible vectorization
Flexible vectorization
Scale from single-threaded debugging to massive parallelism:
- Configurable number of workers and environments
- Batched environment execution
- Support for both sync and async APIs
Command-line interface
Command-line interface
Train, evaluate, and sweep hyperparameters from the terminal:
Performance benchmarks
PufferLib achieves industry-leading performance across a range of environments:| Environment | Steps/second | Hardware |
|---|---|---|
| Ocean Breakout | 1.2M+ | RTX 4090 + 32 cores |
| Atari (vectorized) | 800K+ | RTX 4090 + 32 cores |
| Procgen | 600K+ | RTX 4090 + 32 cores |
| NetHack | 400K+ | RTX 4090 + 32 cores |
These benchmarks reflect end-to-end training throughput including policy inference, not just environment stepping.
Architecture overview
PufferLib is organized into several key components:Who uses PufferLib?
PufferLib powers RL research and applications at:- Academic labs conducting multi-agent research
- Game AI companies building intelligent agents
- Robotics teams prototyping control policies
- Independent researchers exploring new algorithms