PufferLib uses an INI-based configuration system with hierarchical sections and command-line overrides.
Configuration functions
load_config()
Load configuration for a specific environment.
import pufferlib.pufferl as pufferl
# Load default config for an environment
args = pufferl.load_config('puffer_atari_breakout')
# Load with custom overrides
args = pufferl.load_config('puffer_atari_breakout', cli_args=['--train.learning_rate', '0.001'])
Name of the environment (e.g., ‘puffer_atari_breakout’, ‘procgen_coinrun’)
List of command-line style arguments to override config values
Configuration dictionary with all settings organized by section
load_config_file()
Load configuration from a custom .ini file.
# Load custom config
args = pufferl.load_config_file('my_config.ini', fill_in_default=True)
# Load without defaults
args = pufferl.load_config_file('my_config.ini', fill_in_default=False)
Path to .ini configuration file
Whether to fill in missing values from default.ini
Configuration structure
Configurations are organized into sections:
Base section
General settings and metadata.
[base]
mode = train
env_name = puffer_cartpole
env_module = pufferlib.ocean
max_runs = 1
Execution mode: ‘train’, ‘eval’, or ‘sweep’
Python module containing the environment
Number of training runs (for sweeps)
Vec section
Vectorization settings.
[vec]
backend = serial
num_envs = 1
envs_per_worker = auto
envs_per_batch = auto
Vectorization backend: ‘serial’, ‘multiprocessing’, or ‘ray’
Total number of environment instances
envs_per_worker
int | 'auto'
default:"'auto'"
Environments per worker process (for multiprocessing)
envs_per_batch
int | 'auto'
default:"'auto'"
Environments per training batch
Env section
Environment-specific settings.
[env]
num_agents = 1
render_mode = rgb_array
Number of agents per environment
Render mode: ‘rgb_array’, ‘ansi’, or ‘human’
Additional environment-specific parameters can be added and will be passed to the environment constructor.
Policy section
Neural network policy settings.
[policy]
hidden_size = 128
activation = relu
Hidden layer size for policy network
Activation function: ‘relu’, ‘tanh’, ‘gelu’
RNN section
Recurrent network settings (if using LSTM).
[rnn]
input_size = auto
hidden_size = 256
num_layers = 1
input_size
int | 'auto'
default:"'auto'"
LSTM input size (auto-detected from policy hidden_size)
Train section
Training hyperparameters.
[train]
device = cuda
total_timesteps = 10000000
batch_size = auto
bptt_horizon = 16
learning_rate = 0.0003
clip_coef = 0.1
gamma = 0.99
gae_lambda = 0.95
Training device: ‘cuda’ or ‘cpu’
batch_size
int | 'auto'
default:"'auto'"
Batch size for training
Backpropagation through time horizon
See the Configuration guide for all available training parameters.
Sweep section
Hyperparameter sweep settings.
[sweep]
method = Protein
metric = episode_return
goal = maximize
downsample = 100
Sweep method: ‘Protein’, ‘GridSearch’, ‘RandomSearch’, ‘Sobol’
Command-line overrides
Override configuration values from the command line:
# Override single values
puffer train puffer_cartpole --train.learning_rate 0.001
# Override multiple values
puffer train puffer_cartpole \
--train.learning_rate 0.001 \
--train.clip_coef 0.2 \
--vec.num_envs 8
# Override nested values
puffer train puffer_atari_breakout \
--env.frame_stack 4 \
--policy.hidden_size 512
Override syntax
cli_args = [
'--section.parameter', 'value',
'--train.learning_rate', '0.001',
'--train.use_rnn', 'True',
'--vec.backend', 'multiprocessing'
]
args = pufferl.load_config('puffer_cartpole', cli_args=cli_args)
Configuration examples
Custom configuration file
[base]
env_name = my_custom_env
env_module = my_package.environments
[vec]
backend = multiprocessing
num_envs = 16
envs_per_worker = 4
[env]
num_agents = 1
max_steps = 1000
[policy]
hidden_size = 256
[train]
device = cuda
total_timesteps = 50000000
learning_rate = 0.0003
batch_size = 32768
bptt_horizon = 16
use_rnn = True
[rnn]
hidden_size = 512
num_layers = 1
Loading and modifying config
import pufferlib.pufferl as pufferl
# Load base config
args = pufferl.load_config('puffer_cartpole')
# Modify settings
args['vec']['num_envs'] = 8
args['train']['learning_rate'] = 0.001
args['train']['use_rnn'] = True
args['rnn']['hidden_size'] = 256
# Use modified config
vecenv = pufferl.load_env('puffer_cartpole', args)
policy = pufferl.load_policy(args, vecenv, 'puffer_cartpole')
trainer = pufferl.PuffeRL(args['train'], vecenv, policy)
Auto values
Some parameters support ‘auto’ mode:
batch_size = auto: Computed as num_envs * num_agents * bptt_horizon
bptt_horizon = auto: Computed as batch_size / (num_envs * num_agents)
envs_per_worker = auto: Distributed evenly across workers
policy.input_size = auto: Inferred from observation space
rnn.input_size = auto: Set to policy.hidden_size
You must specify either batch_size or bptt_horizon - they cannot both be ‘auto’.
Use args = pufferl.load_config(env_name) to see the complete configuration with all defaults filled in.