Skip to main content
PufferLib uses INI configuration files to manage training hyperparameters, environment settings, and vectorization options.

Configuration structure

Configuration is organized into sections:
  • [base] - Package and environment selection
  • [vec] - Vectorization settings
  • [env] - Environment-specific parameters
  • [policy] - Policy network architecture
  • [rnn] - Recurrent network wrapper settings
  • [train] - Training hyperparameters
  • [sweep] - Hyperparameter sweep configuration

Loading configuration

From environment name

from pufferlib import pufferl

args = pufferl.load_config('puffer_breakout')
This loads the default configuration and merges it with the environment-specific config.

From custom file

args = pufferl.load_config_file('my_config.ini', fill_in_default=True)
Set fill_in_default=False to use only your config without merging defaults.

Modifying in code

args = pufferl.load_config('puffer_breakout')

# Modify configuration
args['train']['learning_rate'] = 0.001
args['train']['total_timesteps'] = 50_000_000
args['vec']['num_envs'] = 4
args['policy']['hidden_size'] = 512

Default configuration

Here’s the default configuration from config/default.ini:
[base]
package = None
env_name = None
policy_name = Policy
rnn_name = None

[vec]
backend = Multiprocessing
num_envs = 2
num_workers = auto
batch_size = auto
zero_copy = True
seed = 42

[env]
# Environment-specific parameters go here

[policy]
# Policy-specific parameters go here

[rnn]
# RNN wrapper parameters go here

[train]
name = pufferai
project = ablations

seed = 42
torch_deterministic = True
cpu_offload = False
device = cuda
optimizer = muon
precision = float32
total_timesteps = 10_000_000
learning_rate = 0.015
anneal_lr = True
min_lr_ratio = 0.0
gamma = 0.995
gae_lambda = 0.90
update_epochs = 1
clip_coef = 0.2
vf_coef = 2.0
vf_clip_coef = 0.2
max_grad_norm = 1.5
ent_coef = 0.001
adam_beta1 = 0.95
adam_beta2 = 0.999
adam_eps = 1e-12

data_dir = experiments
checkpoint_interval = 200
batch_size = auto
minibatch_size = 8192
max_minibatch_size = 32768
bptt_horizon = 64
compile = False
compile_mode = max-autotune-no-cudagraphs

vtrace_rho_clip = 1.0
vtrace_c_clip = 1.0
prio_alpha = 0.8
prio_beta0 = 0.2

Configuration options

Base section

package
str
default:"None"
Environment package name (e.g., ‘atari’, ‘procgen’, ‘ocean’).
env_name
str
default:"None"
Specific environment name within the package.
policy_name
str
default:"Policy"
Policy class name to use (e.g., ‘Policy’, ‘Convolutional’, ‘ProcgenResnet’).
rnn_name
str
default:"None"
RNN wrapper class name (e.g., ‘LSTMWrapper’). Set to None to disable recurrence.

Vectorization section

backend
str
default:"Multiprocessing"
Vectorization backend. Options: Serial, Multiprocessing, Ray, PufferEnv.
num_envs
int
default:"2"
Number of parallel environment processes.
num_workers
int | str
default:"auto"
Number of worker threads. Set to auto to match num_envs.
batch_size
int | str
default:"auto"
Batch size for vectorized environments. Set to auto for automatic sizing.
zero_copy
bool
default:"True"
Enable zero-copy shared memory for faster data transfer.
seed
int
default:"42"
Random seed for environment initialization.

Training section

Core settings

device
str
default:"cuda"
Device for training. Options: cuda, cpu.
total_timesteps
int
default:"10_000_000"
Total number of environment steps to train for.
batch_size
int | str
default:"auto"
Total batch size for training. Auto-calculated from num_envs * bptt_horizon.
bptt_horizon
int
default:"64"
Backpropagation through time horizon (sequence length).
minibatch_size
int
default:"8192"
Size of minibatches for gradient updates.
max_minibatch_size
int
default:"32768"
Maximum minibatch size before gradient accumulation kicks in.

Optimizer settings

optimizer
str
default:"muon"
Optimizer to use. Options: adam, muon.
learning_rate
float
default:"0.015"
Learning rate for optimizer.
anneal_lr
bool
default:"True"
Enable cosine learning rate annealing.
min_lr_ratio
float
default:"0.0"
Minimum learning rate as a ratio of the initial learning rate.
adam_beta1
float
default:"0.95"
Adam beta1 parameter (momentum).
adam_beta2
float
default:"0.999"
Adam beta2 parameter (RMSprop).
adam_eps
float
default:"1e-12"
Adam epsilon for numerical stability.
max_grad_norm
float
default:"1.5"
Maximum gradient norm for clipping.

PPO hyperparameters

gamma
float
default:"0.995"
Discount factor for rewards.
gae_lambda
float
default:"0.90"
Lambda parameter for Generalized Advantage Estimation.
update_epochs
int
default:"1"
Number of epochs to update the policy per batch.
clip_coef
float
default:"0.2"
Clipping coefficient for PPO surrogate loss.
vf_coef
float
default:"2.0"
Coefficient for value function loss.
vf_clip_coef
float
default:"0.2"
Clipping coefficient for value function.
ent_coef
float
default:"0.001"
Entropy coefficient for exploration.

V-trace parameters

vtrace_rho_clip
float
default:"1.0"
Clipping threshold for V-trace importance sampling ratio.
vtrace_c_clip
float
default:"1.0"
Clipping threshold for V-trace trace coefficient.

Prioritization parameters

prio_alpha
float
default:"0.8"
Prioritization exponent (0 = uniform, 1 = full prioritization).
prio_beta0
float
default:"0.2"
Initial importance sampling correction factor.

Performance settings

precision
str
default:"float32"
Training precision. Options: float32, bfloat16.
compile
bool
default:"False"
Enable torch.compile for policy and sampling functions.
compile_mode
str
default:"max-autotune-no-cudagraphs"
Torch compile mode. Options: default, reduce-overhead, max-autotune, max-autotune-no-cudagraphs.
cpu_offload
bool
default:"False"
Offload observation buffers to CPU to save GPU memory.

Checkpointing

data_dir
str
default:"experiments"
Directory for saving checkpoints and logs.
checkpoint_interval
int
default:"200"
Save checkpoint every N epochs.

Miscellaneous

seed
int
default:"42"
Random seed for reproducibility.
torch_deterministic
bool
default:"True"
Enable deterministic CUDA operations.

Command-line arguments

All configuration options can be overridden via command-line arguments:
puffer train puffer_breakout \
  --train.learning-rate 0.001 \
  --train.total-timesteps 50000000 \
  --vec.num-envs 4 \
  --policy.hidden-size 512
Argument format:
  • Section options: --section.key value
  • Base options: --key value
  • Underscores become hyphens: learning_rate → --train.learning-rate

Creating custom configs

Create a custom .ini file:
my_config.ini
[base]
package = ocean
env_name = puffer_breakout
policy_name = Convolutional
rnn_name = LSTMWrapper

[vec]
num_envs = 4
batch_size = 4096

[policy]
hidden_size = 512
framestack = 4
flat_size = 3136

[rnn]
input_size = 512
hidden_size = 512

[train]
total_timesteps = 50_000_000
learning_rate = 0.0003
batch_size = 262144
minibatch_size = 16384
bptt_horizon = 64
Load it:
args = pufferl.load_config_file('my_config.ini')
When fill_in_default=True, your config is merged with default.ini. Values in your file override defaults.

Auto values

Some parameters support auto for automatic configuration:
  • batch_size = auto - Calculated as num_envs * bptt_horizon
  • bptt_horizon = auto - Calculated as batch_size / num_envs
  • num_workers = auto - Set to match num_envs
You must specify either batch_size or bptt_horizon (not both as auto).

Example configurations

Atari with CNN + LSTM

[base]
package = ocean
env_name = puffer_breakout
policy_name = Convolutional
rnn_name = LSTMWrapper

[policy]
hidden_size = 512
framestack = 1
flat_size = 3136

[rnn]
input_size = 512
hidden_size = 512

[train]
total_timesteps = 50_000_000
bptt_horizon = 64

Continuous control

[base]
package = mujoco
env_name = HalfCheetah-v4
policy_name = Policy

[policy]
hidden_size = 256

[train]
total_timesteps = 1_000_000
learning_rate = 0.0003
gamma = 0.99
gae_lambda = 0.95
ent_coef = 0.0

High-throughput training

[vec]
backend = PufferEnv
num_envs = 1

[env]
num_envs = 8192

[train]
batch_size = 524288
minibatch_size = 32768
max_minibatch_size = 65536
precision = bfloat16
compile = True

Build docs developers (and LLMs) love