Command-line interface

PufferLib provides a command-line interface for training, evaluation, hyperparameter sweeps, and utilities.

Basic syntax

puffer <command> <env_name> [options]

Available commands:

train - Train a policy
eval - Evaluate a trained policy
sweep - Run hyperparameter sweeps
autotune - Optimize vectorization settings
profile - Profile training performance
export - Export model weights

Training

Train a policy on an environment:

puffer train puffer_breakout

With custom parameters

puffer train puffer_breakout \
  --train.learning-rate 0.001 \
  --train.total-timesteps 50000000 \
  --vec.num-envs 4 \
  --policy.hidden-size 512

With custom config file

puffer train puffer_breakout --config my_config.ini

With logging

puffer train puffer_breakout \
  --wandb \
  --wandb-project my-project \
  --wandb-group experiment-1

Common training options

--train.total-timesteps

int

default:"10000000"

Total environment steps to train for.

--train.learning-rate

float

default:"0.015"

Learning rate for optimizer.

--train.device

str

default:"cuda"

Device to train on (cuda or cpu).

--train.batch-size

int

default:"auto"

Training batch size.

--train.bptt-horizon

int

default:"64"

Backpropagation through time horizon.

--vec.num-envs

int

default:"2"

Number of parallel environment processes.

--policy.hidden-size

int

default:"128"

Hidden layer size for policy network.

--load-model-path

str

default:"None"

Path to pretrained checkpoint to resume from.

--tag

str

default:"None"

Tag for experiment tracking.

Evaluation

Evaluate a trained policy:

puffer eval puffer_breakout --load-model-path experiments/model.pt

Load from wandb/Neptune

puffer eval puffer_breakout \
  --wandb \
  --load-id abc123xyz

Save evaluation video

puffer eval puffer_breakout \
  --load-model-path experiments/model.pt \
  --save-frames 1000 \
  --gif-path eval.gif \
  --fps 30

Evaluation options

--load-model-path

str

default:"None"

Path to model checkpoint. Use latest to load most recent checkpoint.

--load-id

str

default:"None"

Load model from wandb or Neptune run ID.

--render-mode

str

default:"auto"

Rendering mode. Options: auto, human, ansi, rgb_array, raylib, None.

--save-frames

int

default:"0"

Number of frames to save for video generation.

--gif-path

str

default:"eval.gif"

Path to save evaluation video.

--fps

float

default:"15"

Frames per second for video/rendering.

Hyperparameter sweeps

Run automated hyperparameter optimization:

puffer sweep puffer_breakout --wandb --max-runs 100

Sweep configuration

Sweeps are configured in the [sweep] section of your config file:

[sweep]
method = Protein
metric = score
metric_distribution = linear
goal = maximize
max_suggestion_cost = 3600
downsample = 5
early_stop_quantile = 0.3

[sweep.train.learning_rate]
distribution = log_normal
min = 0.00001
max = 0.1
scale = 0.5

[sweep.train.gamma]
distribution = logit_normal
min = 0.8
max = 0.9999
scale = auto

Sweep methods

--sweep.method

str

default:"Protein"

Sweep algorithm. Options: Protein (Bayesian optimization), Random, Grid.

Sweep options

--max-runs

int

default:"200"

Maximum number of sweep runs.

--sweep.metric

str

default:"score"

Metric to optimize (from environment info dict).

--sweep.goal

str

default:"maximize"

Optimization goal (maximize or minimize).

--sweep.early-stop-quantile

float

default:"0.3"

Early stop runs performing below this quantile.

Sweeps require either --wandb or --neptune for logging.

Distributed training

Train with PyTorch DDP using torchrun:

torchrun --standalone --nnodes=1 --nproc-per-node=4 \
  -m pufferlib.pufferl train puffer_breakout

Multi-node training

# On rank 0 node
torchrun --nnodes=2 --nproc-per-node=4 --node-rank=0 \
  --master-addr=192.168.1.1 --master-port=29500 \
  -m pufferlib.pufferl train puffer_breakout

# On rank 1 node
torchrun --nnodes=2 --nproc-per-node=4 --node-rank=1 \
  --master-addr=192.168.1.1 --master-port=29500 \
  -m pufferlib.pufferl train puffer_breakout

Utilities

Autotune vectorization

Find optimal vectorization settings:

puffer autotune puffer_breakout --train.env-batch-size 4096

Tests different num_envs and num_workers combinations to maximize throughput.

Profile performance

Generate performance profile:

puffer profile puffer_breakout

Creates trace.json for viewing in chrome://tracing.

Export weights

Export model weights to binary file:

puffer export puffer_breakout --load-model-path experiments/model.pt

Creates {env_name}_weights.bin for deployment.

Complete examples

Train Atari Breakout

puffer train puffer_breakout \
  --train.total-timesteps 50000000 \
  --train.learning-rate 0.0003 \
  --vec.num-envs 4 \
  --wandb \
  --wandb-project atari-experiments \
  --tag breakout-baseline

Train with LSTM

puffer train puffer_breakout \
  --rnn-name LSTMWrapper \
  --rnn.hidden-size 512 \
  --policy.hidden-size 512 \
  --train.bptt-horizon 64

High-performance training

puffer train puffer_breakout \
  --vec.backend PufferEnv \
  --vec.num-envs 1 \
  --env.num-envs 8192 \
  --train.batch-size 524288 \
  --train.minibatch-size 32768 \
  --train.precision bfloat16 \
  --train.compile True

Resume from checkpoint

puffer train puffer_breakout \
  --load-model-path experiments/puffer_breakout_12345/model_puffer_breakout_000200.pt

Evaluate and record

puffer eval puffer_breakout \
  --load-model-path latest \
  --save-frames 3000 \
  --gif-path breakout_eval.gif \
  --fps 30 \
  --render-mode rgb_array

Run parameter sweep

puffer sweep puffer_breakout \
  --wandb \
  --wandb-project breakout-sweep \
  --max-runs 50 \
  --sweep.metric score \
  --sweep.goal maximize

Help and documentation

Get help for any command:

puffer train --help

This shows all available options with their default values and descriptions.

Python module invocation

You can also invoke the CLI as a Python module:

python -m pufferlib.pufferl train puffer_breakout

This is equivalent to the puffer command.

Getting Started

Core Concepts

Training

Environment Wrappers

Ocean Environments

Advanced

Examples

Command-line interface

Basic syntax

Training

With custom parameters

With custom config file

With logging

Common training options

Evaluation

Load from wandb/Neptune

Save evaluation video

Evaluation options

Hyperparameter sweeps

Sweep configuration

Sweep methods

Sweep options

Distributed training

Multi-node training

Utilities

Autotune vectorization

Profile performance

Export weights

Complete examples

Train Atari Breakout

Train with LSTM

High-performance training

Resume from checkpoint

Evaluate and record

Run parameter sweep

Help and documentation

Python module invocation

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Training

Environment Wrappers

Ocean Environments

Advanced

Examples

​Basic syntax

​Training

​With custom parameters

​With custom config file

​With logging

​Common training options

​Evaluation

​Load from wandb/Neptune

​Save evaluation video

​Evaluation options

​Hyperparameter sweeps

​Sweep configuration

​Sweep methods

​Sweep options

​Distributed training

​Multi-node training

​Utilities

​Autotune vectorization

​Profile performance

​Export weights

​Complete examples

​Train Atari Breakout

​Train with LSTM

​High-performance training

​Resume from checkpoint

​Evaluate and record

​Run parameter sweep

​Help and documentation

​Python module invocation

Build docs developers (and LLMs) love

Basic syntax

Training

With custom parameters

With custom config file

With logging

Common training options

Evaluation

Load from wandb/Neptune

Save evaluation video

Evaluation options

Hyperparameter sweeps

Sweep configuration

Sweep methods

Sweep options

Distributed training

Multi-node training

Utilities

Autotune vectorization

Profile performance

Export weights

Complete examples

Train Atari Breakout

Train with LSTM

High-performance training

Resume from checkpoint

Evaluate and record

Run parameter sweep

Help and documentation

Python module invocation