PufferLib provides a command-line interface for training, evaluation, hyperparameter sweeps, and utilities.
Basic syntax
puffer <command> <env_name> [options]
Available commands:
train - Train a policy
eval - Evaluate a trained policy
sweep - Run hyperparameter sweeps
autotune - Optimize vectorization settings
profile - Profile training performance
export - Export model weights
Training
Train a policy on an environment:
puffer train puffer_breakout
With custom parameters
puffer train puffer_breakout \
--train.learning-rate 0.001 \
--train.total-timesteps 50000000 \
--vec.num-envs 4 \
--policy.hidden-size 512
With custom config file
puffer train puffer_breakout --config my_config.ini
With logging
puffer train puffer_breakout \
--wandb \
--wandb-project my-project \
--wandb-group experiment-1
Common training options
Total environment steps to train for.
Learning rate for optimizer.
Device to train on (cuda or cpu).
Backpropagation through time horizon.
Number of parallel environment processes.
Hidden layer size for policy network.
Path to pretrained checkpoint to resume from.
Tag for experiment tracking.
Evaluation
Evaluate a trained policy:
puffer eval puffer_breakout --load-model-path experiments/model.pt
Load from wandb/Neptune
puffer eval puffer_breakout \
--wandb \
--load-id abc123xyz
Save evaluation video
puffer eval puffer_breakout \
--load-model-path experiments/model.pt \
--save-frames 1000 \
--gif-path eval.gif \
--fps 30
Evaluation options
Path to model checkpoint. Use latest to load most recent checkpoint.
Load model from wandb or Neptune run ID.
Rendering mode. Options: auto, human, ansi, rgb_array, raylib, None.
Number of frames to save for video generation.
Path to save evaluation video.
Frames per second for video/rendering.
Hyperparameter sweeps
Run automated hyperparameter optimization:
puffer sweep puffer_breakout --wandb --max-runs 100
Sweep configuration
Sweeps are configured in the [sweep] section of your config file:
[sweep]
method = Protein
metric = score
metric_distribution = linear
goal = maximize
max_suggestion_cost = 3600
downsample = 5
early_stop_quantile = 0.3
[sweep.train.learning_rate]
distribution = log_normal
min = 0.00001
max = 0.1
scale = 0.5
[sweep.train.gamma]
distribution = logit_normal
min = 0.8
max = 0.9999
scale = auto
Sweep methods
Sweep algorithm. Options: Protein (Bayesian optimization), Random, Grid.
Sweep options
Maximum number of sweep runs.
Metric to optimize (from environment info dict).
Optimization goal (maximize or minimize).
--sweep.early-stop-quantile
Early stop runs performing below this quantile.
Sweeps require either --wandb or --neptune for logging.
Distributed training
Train with PyTorch DDP using torchrun:
torchrun --standalone --nnodes=1 --nproc-per-node=4 \
-m pufferlib.pufferl train puffer_breakout
Multi-node training
# On rank 0 node
torchrun --nnodes=2 --nproc-per-node=4 --node-rank=0 \
--master-addr=192.168.1.1 --master-port=29500 \
-m pufferlib.pufferl train puffer_breakout
# On rank 1 node
torchrun --nnodes=2 --nproc-per-node=4 --node-rank=1 \
--master-addr=192.168.1.1 --master-port=29500 \
-m pufferlib.pufferl train puffer_breakout
Utilities
Autotune vectorization
Find optimal vectorization settings:
puffer autotune puffer_breakout --train.env-batch-size 4096
Tests different num_envs and num_workers combinations to maximize throughput.
Generate performance profile:
puffer profile puffer_breakout
Creates trace.json for viewing in chrome://tracing.
Export weights
Export model weights to binary file:
puffer export puffer_breakout --load-model-path experiments/model.pt
Creates {env_name}_weights.bin for deployment.
Complete examples
Train Atari Breakout
puffer train puffer_breakout \
--train.total-timesteps 50000000 \
--train.learning-rate 0.0003 \
--vec.num-envs 4 \
--wandb \
--wandb-project atari-experiments \
--tag breakout-baseline
Train with LSTM
puffer train puffer_breakout \
--rnn-name LSTMWrapper \
--rnn.hidden-size 512 \
--policy.hidden-size 512 \
--train.bptt-horizon 64
puffer train puffer_breakout \
--vec.backend PufferEnv \
--vec.num-envs 1 \
--env.num-envs 8192 \
--train.batch-size 524288 \
--train.minibatch-size 32768 \
--train.precision bfloat16 \
--train.compile True
Resume from checkpoint
puffer train puffer_breakout \
--load-model-path experiments/puffer_breakout_12345/model_puffer_breakout_000200.pt
Evaluate and record
puffer eval puffer_breakout \
--load-model-path latest \
--save-frames 3000 \
--gif-path breakout_eval.gif \
--fps 30 \
--render-mode rgb_array
Run parameter sweep
puffer sweep puffer_breakout \
--wandb \
--wandb-project breakout-sweep \
--max-runs 50 \
--sweep.metric score \
--sweep.goal maximize
Help and documentation
Get help for any command:
This shows all available options with their default values and descriptions.
Python module invocation
You can also invoke the CLI as a Python module:
python -m pufferlib.pufferl train puffer_breakout
This is equivalent to the puffer command.