Skip to main content
PufferLib uses an INI-based configuration system with hierarchical sections and command-line overrides.

Configuration functions

load_config()

Load configuration for a specific environment.
import pufferlib.pufferl as pufferl

# Load default config for an environment
args = pufferl.load_config('puffer_atari_breakout')

# Load with custom overrides
args = pufferl.load_config('puffer_atari_breakout', cli_args=['--train.learning_rate', '0.001'])
env_name
str
required
Name of the environment (e.g., ‘puffer_atari_breakout’, ‘procgen_coinrun’)
cli_args
list[str]
List of command-line style arguments to override config values
args
dict
Configuration dictionary with all settings organized by section

load_config_file()

Load configuration from a custom .ini file.
# Load custom config
args = pufferl.load_config_file('my_config.ini', fill_in_default=True)

# Load without defaults
args = pufferl.load_config_file('my_config.ini', fill_in_default=False)
config_file
str
required
Path to .ini configuration file
fill_in_default
bool
default:"True"
Whether to fill in missing values from default.ini
cli_args
list[str]
Command-line overrides
args
dict
Configuration dictionary

Configuration structure

Configurations are organized into sections:

Base section

General settings and metadata.
[base]
mode = train
env_name = puffer_cartpole
env_module = pufferlib.ocean
max_runs = 1
mode
str
default:"'train'"
Execution mode: ‘train’, ‘eval’, or ‘sweep’
env_name
str
Environment identifier
env_module
str
Python module containing the environment
max_runs
int
default:"1"
Number of training runs (for sweeps)

Vec section

Vectorization settings.
[vec]
backend = serial
num_envs = 1
envs_per_worker = auto
envs_per_batch = auto
backend
str
default:"'serial'"
Vectorization backend: ‘serial’, ‘multiprocessing’, or ‘ray’
num_envs
int
required
Total number of environment instances
envs_per_worker
int | 'auto'
default:"'auto'"
Environments per worker process (for multiprocessing)
envs_per_batch
int | 'auto'
default:"'auto'"
Environments per training batch

Env section

Environment-specific settings.
[env]
num_agents = 1
render_mode = rgb_array
num_agents
int
default:"1"
Number of agents per environment
render_mode
str
Render mode: ‘rgb_array’, ‘ansi’, or ‘human’
Additional environment-specific parameters can be added and will be passed to the environment constructor.

Policy section

Neural network policy settings.
[policy]
hidden_size = 128
activation = relu
hidden_size
int
default:"128"
Hidden layer size for policy network
activation
str
default:"'relu'"
Activation function: ‘relu’, ‘tanh’, ‘gelu’

RNN section

Recurrent network settings (if using LSTM).
[rnn]
input_size = auto
hidden_size = 256
num_layers = 1
input_size
int | 'auto'
default:"'auto'"
LSTM input size (auto-detected from policy hidden_size)
hidden_size
int
default:"256"
LSTM hidden state size
num_layers
int
default:"1"
Number of LSTM layers

Train section

Training hyperparameters.
[train]
device = cuda
total_timesteps = 10000000
batch_size = auto
bptt_horizon = 16
learning_rate = 0.0003
clip_coef = 0.1
gamma = 0.99
gae_lambda = 0.95
device
str
default:"'cuda'"
Training device: ‘cuda’ or ‘cpu’
total_timesteps
int
required
Total training timesteps
batch_size
int | 'auto'
default:"'auto'"
Batch size for training
bptt_horizon
int
default:"16"
Backpropagation through time horizon
learning_rate
float
default:"0.0003"
Learning rate
clip_coef
float
default:"0.1"
PPO clipping coefficient
gamma
float
default:"0.99"
Discount factor
gae_lambda
float
default:"0.95"
GAE lambda parameter
See the Configuration guide for all available training parameters.

Sweep section

Hyperparameter sweep settings.
[sweep]
method = Protein
metric = episode_return
goal = maximize
downsample = 100
method
str
Sweep method: ‘Protein’, ‘GridSearch’, ‘RandomSearch’, ‘Sobol’
metric
str
Metric to optimize
goal
str
‘maximize’ or ‘minimize’
downsample
int
default:"100"
Sample every N timesteps

Command-line overrides

Override configuration values from the command line:
# Override single values
puffer train puffer_cartpole --train.learning_rate 0.001

# Override multiple values
puffer train puffer_cartpole \
  --train.learning_rate 0.001 \
  --train.clip_coef 0.2 \
  --vec.num_envs 8

# Override nested values
puffer train puffer_atari_breakout \
  --env.frame_stack 4 \
  --policy.hidden_size 512

Override syntax

cli_args = [
    '--section.parameter', 'value',
    '--train.learning_rate', '0.001',
    '--train.use_rnn', 'True',
    '--vec.backend', 'multiprocessing'
]

args = pufferl.load_config('puffer_cartpole', cli_args=cli_args)

Configuration examples

Custom configuration file

my_config.ini
[base]
env_name = my_custom_env
env_module = my_package.environments

[vec]
backend = multiprocessing
num_envs = 16
envs_per_worker = 4

[env]
num_agents = 1
max_steps = 1000

[policy]
hidden_size = 256

[train]
device = cuda
total_timesteps = 50000000
learning_rate = 0.0003
batch_size = 32768
bptt_horizon = 16
use_rnn = True

[rnn]
hidden_size = 512
num_layers = 1

Loading and modifying config

import pufferlib.pufferl as pufferl

# Load base config
args = pufferl.load_config('puffer_cartpole')

# Modify settings
args['vec']['num_envs'] = 8
args['train']['learning_rate'] = 0.001
args['train']['use_rnn'] = True
args['rnn']['hidden_size'] = 256

# Use modified config
vecenv = pufferl.load_env('puffer_cartpole', args)
policy = pufferl.load_policy(args, vecenv, 'puffer_cartpole')
trainer = pufferl.PuffeRL(args['train'], vecenv, policy)

Auto values

Some parameters support ‘auto’ mode:
  • batch_size = auto: Computed as num_envs * num_agents * bptt_horizon
  • bptt_horizon = auto: Computed as batch_size / (num_envs * num_agents)
  • envs_per_worker = auto: Distributed evenly across workers
  • policy.input_size = auto: Inferred from observation space
  • rnn.input_size = auto: Set to policy.hidden_size
You must specify either batch_size or bptt_horizon - they cannot both be ‘auto’.
Use args = pufferl.load_config(env_name) to see the complete configuration with all defaults filled in.

Build docs developers (and LLMs) love