Skip to main content
PufferLib provides a command-line interface for training, evaluation, hyperparameter sweeps, and utilities.

Basic syntax

puffer <command> <env_name> [options]
Available commands:
  • train - Train a policy
  • eval - Evaluate a trained policy
  • sweep - Run hyperparameter sweeps
  • autotune - Optimize vectorization settings
  • profile - Profile training performance
  • export - Export model weights

Training

Train a policy on an environment:
puffer train puffer_breakout

With custom parameters

puffer train puffer_breakout \
  --train.learning-rate 0.001 \
  --train.total-timesteps 50000000 \
  --vec.num-envs 4 \
  --policy.hidden-size 512

With custom config file

puffer train puffer_breakout --config my_config.ini

With logging

puffer train puffer_breakout \
  --wandb \
  --wandb-project my-project \
  --wandb-group experiment-1

Common training options

--train.total-timesteps
int
default:"10000000"
Total environment steps to train for.
--train.learning-rate
float
default:"0.015"
Learning rate for optimizer.
--train.device
str
default:"cuda"
Device to train on (cuda or cpu).
--train.batch-size
int
default:"auto"
Training batch size.
--train.bptt-horizon
int
default:"64"
Backpropagation through time horizon.
--vec.num-envs
int
default:"2"
Number of parallel environment processes.
--policy.hidden-size
int
default:"128"
Hidden layer size for policy network.
--load-model-path
str
default:"None"
Path to pretrained checkpoint to resume from.
--tag
str
default:"None"
Tag for experiment tracking.

Evaluation

Evaluate a trained policy:
puffer eval puffer_breakout --load-model-path experiments/model.pt

Load from wandb/Neptune

puffer eval puffer_breakout \
  --wandb \
  --load-id abc123xyz

Save evaluation video

puffer eval puffer_breakout \
  --load-model-path experiments/model.pt \
  --save-frames 1000 \
  --gif-path eval.gif \
  --fps 30

Evaluation options

--load-model-path
str
default:"None"
Path to model checkpoint. Use latest to load most recent checkpoint.
--load-id
str
default:"None"
Load model from wandb or Neptune run ID.
--render-mode
str
default:"auto"
Rendering mode. Options: auto, human, ansi, rgb_array, raylib, None.
--save-frames
int
default:"0"
Number of frames to save for video generation.
--gif-path
str
default:"eval.gif"
Path to save evaluation video.
--fps
float
default:"15"
Frames per second for video/rendering.

Hyperparameter sweeps

Run automated hyperparameter optimization:
puffer sweep puffer_breakout --wandb --max-runs 100

Sweep configuration

Sweeps are configured in the [sweep] section of your config file:
[sweep]
method = Protein
metric = score
metric_distribution = linear
goal = maximize
max_suggestion_cost = 3600
downsample = 5
early_stop_quantile = 0.3

[sweep.train.learning_rate]
distribution = log_normal
min = 0.00001
max = 0.1
scale = 0.5

[sweep.train.gamma]
distribution = logit_normal
min = 0.8
max = 0.9999
scale = auto

Sweep methods

--sweep.method
str
default:"Protein"
Sweep algorithm. Options: Protein (Bayesian optimization), Random, Grid.

Sweep options

--max-runs
int
default:"200"
Maximum number of sweep runs.
--sweep.metric
str
default:"score"
Metric to optimize (from environment info dict).
--sweep.goal
str
default:"maximize"
Optimization goal (maximize or minimize).
--sweep.early-stop-quantile
float
default:"0.3"
Early stop runs performing below this quantile.
Sweeps require either --wandb or --neptune for logging.

Distributed training

Train with PyTorch DDP using torchrun:
torchrun --standalone --nnodes=1 --nproc-per-node=4 \
  -m pufferlib.pufferl train puffer_breakout

Multi-node training

# On rank 0 node
torchrun --nnodes=2 --nproc-per-node=4 --node-rank=0 \
  --master-addr=192.168.1.1 --master-port=29500 \
  -m pufferlib.pufferl train puffer_breakout

# On rank 1 node
torchrun --nnodes=2 --nproc-per-node=4 --node-rank=1 \
  --master-addr=192.168.1.1 --master-port=29500 \
  -m pufferlib.pufferl train puffer_breakout

Utilities

Autotune vectorization

Find optimal vectorization settings:
puffer autotune puffer_breakout --train.env-batch-size 4096
Tests different num_envs and num_workers combinations to maximize throughput.

Profile performance

Generate performance profile:
puffer profile puffer_breakout
Creates trace.json for viewing in chrome://tracing.

Export weights

Export model weights to binary file:
puffer export puffer_breakout --load-model-path experiments/model.pt
Creates {env_name}_weights.bin for deployment.

Complete examples

Train Atari Breakout

puffer train puffer_breakout \
  --train.total-timesteps 50000000 \
  --train.learning-rate 0.0003 \
  --vec.num-envs 4 \
  --wandb \
  --wandb-project atari-experiments \
  --tag breakout-baseline

Train with LSTM

puffer train puffer_breakout \
  --rnn-name LSTMWrapper \
  --rnn.hidden-size 512 \
  --policy.hidden-size 512 \
  --train.bptt-horizon 64

High-performance training

puffer train puffer_breakout \
  --vec.backend PufferEnv \
  --vec.num-envs 1 \
  --env.num-envs 8192 \
  --train.batch-size 524288 \
  --train.minibatch-size 32768 \
  --train.precision bfloat16 \
  --train.compile True

Resume from checkpoint

puffer train puffer_breakout \
  --load-model-path experiments/puffer_breakout_12345/model_puffer_breakout_000200.pt

Evaluate and record

puffer eval puffer_breakout \
  --load-model-path latest \
  --save-frames 3000 \
  --gif-path breakout_eval.gif \
  --fps 30 \
  --render-mode rgb_array

Run parameter sweep

puffer sweep puffer_breakout \
  --wandb \
  --wandb-project breakout-sweep \
  --max-runs 50 \
  --sweep.metric score \
  --sweep.goal maximize

Help and documentation

Get help for any command:
puffer train --help
This shows all available options with their default values and descriptions.

Python module invocation

You can also invoke the CLI as a Python module:
python -m pufferlib.pufferl train puffer_breakout
This is equivalent to the puffer command.

Build docs developers (and LLMs) love