Skip to main content
The sweep module provides Bayesian optimization for hyperparameter tuning using Gaussian processes and acquisition functions.

Sweep methods

PufferLib supports multiple sweep methods for hyperparameter optimization:

Protein

Bayesian optimization with GP models and acquisition functions

GridSearch

Exhaustive grid search over parameter space

RandomSearch

Random sampling from parameter distributions

Sobol

Quasi-random Sobol sequences for efficient sampling

Space classes

Define search spaces for hyperparameters:

Linear

Linear spacing between min and max values.
min
float
required
Minimum value of the parameter
max
float
required
Maximum value of the parameter
scale
float | 'auto'
default:"'auto'"
Scaling factor for the parameter (default: 0.5)
is_integer
bool
default:"False"
Whether to round values to integers
from pufferlib.sweep import Linear

# Continuous linear space
batch_size = Linear(min=64, max=1024, scale='auto', is_integer=True)

# Float linear space
learning_rate = Linear(min=0.0001, max=0.01, scale='auto')

Log

Logarithmic spacing for parameters that vary over orders of magnitude.
min
float
required
Minimum value (must be > 0)
max
float
required
Maximum value (must be > 0)
scale
float | 'auto' | 'time'
default:"'auto'"
Scaling factor. Use ‘time’ for time-based scaling.
from pufferlib.sweep import Log

# Learning rate with log scaling
lr = Log(min=1e-5, max=1e-2, scale='auto')

# Entropy coefficient
ent_coef = Log(min=0.0001, max=0.1, scale='time')

Pow2

Power-of-2 spacing for batch sizes and similar parameters.
from pufferlib.sweep import Pow2

# Batch size in powers of 2
batch = Pow2(min=64, max=4096, scale='auto', is_integer=True)

Logit

Logit-transformed spacing for parameters in (0, 1) range.
from pufferlib.sweep import Logit

# Discount factor
gamma = Logit(min=0.9, max=0.999, scale='auto')

# GAE lambda
gae_lambda = Logit(min=0.8, max=0.99, scale='auto')

Protein optimizer

Bayesian optimization using Gaussian processes.

Configuration

sweep_config = {
    'method': 'Protein',
    'metric': 'episode_return',
    'goal': 'maximize',
    'sweep_only': 'train.learning_rate,train.clip_coef',
    'downsample': 100,  # Sample every N timesteps
    'use_gpu': True,
    'early_stop_quantile': 0.1,
    'max_suggestion_cost': 1e7,
    
    # Parameter search spaces
    'train.learning_rate': {
        'distribution': 'log',
        'min': 1e-5,
        'max': 1e-2,
        'scale': 'auto'
    },
    'train.clip_coef': {
        'distribution': 'linear',
        'min': 0.1,
        'max': 0.5,
        'scale': 'auto'
    }
}
method
str
required
Optimization method: ‘Protein’, ‘GridSearch’, ‘RandomSearch’, or ‘Sobol’
metric
str
required
Metric to optimize (e.g., ‘episode_return’, ‘episode_length’)
goal
str
default:"'maximize'"
Optimization goal: ‘maximize’ or ‘minimize’
sweep_only
str
Comma-separated list of parameters to sweep over
downsample
int
default:"100"
Sample metric every N timesteps
use_gpu
bool
default:"True"
Use GPU for GP model training
early_stop_quantile
float
default:"0.1"
Quantile threshold for early stopping
max_suggestion_cost
float
Maximum cost (timesteps) for suggestions

Running sweeps

Command line

# Run hyperparameter sweep
puffer sweep puffer_cartpole

# With custom config
puffer sweep puffer_cartpole --sweep.method Protein --sweep.metric episode_return

Python API

import pufferlib.pufferl as pufferl

# Load config with sweep settings
args = pufferl.load_config('puffer_cartpole')

# Configure sweep
args['sweep'] = {
    'method': 'Protein',
    'metric': 'episode_return',
    'goal': 'maximize',
    'downsample': 100,
    'train.learning_rate': {
        'distribution': 'log',
        'min': 1e-5,
        'max': 1e-2
    }
}

# Run sweep (requires wandb or neptune)
args['wandb'] = True
pufferl.sweep(args)

Sweep methods reference

Protein.suggest()

Suggest next hyperparameter configuration using Bayesian optimization.
args
dict
required
Configuration dictionary to update with suggestions
None
None
Updates args in-place with suggested values

Protein.observe()

Record observation from a training run.
params
dict
required
Dictionary of hyperparameter values
metric
float
required
Observed metric value
cost
float
required
Cost (e.g., timesteps) of the run

Protein.should_stop()

Check if run should be stopped early.
metric
float
required
Current metric value
cost
float
required
Current cost
bool
bool
True if run should be stopped

Examples

Basic sweep

import pufferlib.sweep as sweep

# Define parameter spaces
spaces = {
    'learning_rate': sweep.Log(1e-5, 1e-2, 'auto'),
    'batch_size': sweep.Pow2(64, 4096, 'auto', is_integer=True),
    'gamma': sweep.Logit(0.9, 0.999, 'auto')
}

# Create Protein optimizer
optimizer = sweep.Protein({
    'method': 'Protein',
    'metric': 'episode_return',
    'goal': 'maximize',
    **spaces
})

# Suggest and observe
params = {}
optimizer.suggest(params)
print(f"Suggested params: {params}")

# After training
optimizer.observe(params, metric=150.0, cost=1e6)

Multi-objective optimization

# Optimize for return while minimizing training time
config = {
    'method': 'Protein',
    'metric': 'episode_return',
    'goal': 'maximize',
    'prune_pareto': True,  # Pareto front pruning
    'max_suggestion_cost': 5e6,  # Max 5M timesteps
    
    'train.learning_rate': {
        'distribution': 'log',
        'min': 1e-5,
        'max': 1e-2
    },
    'train.num_minibatches': {
        'distribution': 'pow2',
        'min': 1,
        'max': 16,
        'is_integer': True
    }
}
Sweeps require either wandb or neptune for tracking. Set args['wandb'] = True or args['neptune'] = True before running.
The Protein optimizer uses GPU-accelerated Gaussian processes. For large parameter spaces, consider using use_gpu=True and adequate GPU memory.

Build docs developers (and LLMs) love