The sweep module provides Bayesian optimization for hyperparameter tuning using Gaussian processes and acquisition functions.
Sweep methods
PufferLib supports multiple sweep methods for hyperparameter optimization:
Protein Bayesian optimization with GP models and acquisition functions
GridSearch Exhaustive grid search over parameter space
RandomSearch Random sampling from parameter distributions
Sobol Quasi-random Sobol sequences for efficient sampling
Space classes
Define search spaces for hyperparameters:
Linear
Linear spacing between min and max values.
Minimum value of the parameter
Maximum value of the parameter
scale
float | 'auto'
default: "'auto'"
Scaling factor for the parameter (default: 0.5)
Whether to round values to integers
from pufferlib.sweep import Linear
# Continuous linear space
batch_size = Linear( min = 64 , max = 1024 , scale = 'auto' , is_integer = True )
# Float linear space
learning_rate = Linear( min = 0.0001 , max = 0.01 , scale = 'auto' )
Log
Logarithmic spacing for parameters that vary over orders of magnitude.
Minimum value (must be > 0)
Maximum value (must be > 0)
scale
float | 'auto' | 'time'
default: "'auto'"
Scaling factor. Use ‘time’ for time-based scaling.
from pufferlib.sweep import Log
# Learning rate with log scaling
lr = Log( min = 1e-5 , max = 1e-2 , scale = 'auto' )
# Entropy coefficient
ent_coef = Log( min = 0.0001 , max = 0.1 , scale = 'time' )
Pow2
Power-of-2 spacing for batch sizes and similar parameters.
from pufferlib.sweep import Pow2
# Batch size in powers of 2
batch = Pow2( min = 64 , max = 4096 , scale = 'auto' , is_integer = True )
Logit
Logit-transformed spacing for parameters in (0, 1) range.
from pufferlib.sweep import Logit
# Discount factor
gamma = Logit( min = 0.9 , max = 0.999 , scale = 'auto' )
# GAE lambda
gae_lambda = Logit( min = 0.8 , max = 0.99 , scale = 'auto' )
Protein optimizer
Bayesian optimization using Gaussian processes.
Configuration
sweep_config = {
'method' : 'Protein' ,
'metric' : 'episode_return' ,
'goal' : 'maximize' ,
'sweep_only' : 'train.learning_rate,train.clip_coef' ,
'downsample' : 100 , # Sample every N timesteps
'use_gpu' : True ,
'early_stop_quantile' : 0.1 ,
'max_suggestion_cost' : 1e7 ,
# Parameter search spaces
'train.learning_rate' : {
'distribution' : 'log' ,
'min' : 1e-5 ,
'max' : 1e-2 ,
'scale' : 'auto'
},
'train.clip_coef' : {
'distribution' : 'linear' ,
'min' : 0.1 ,
'max' : 0.5 ,
'scale' : 'auto'
}
}
Optimization method: ‘Protein’, ‘GridSearch’, ‘RandomSearch’, or ‘Sobol’
Metric to optimize (e.g., ‘episode_return’, ‘episode_length’)
Optimization goal: ‘maximize’ or ‘minimize’
Comma-separated list of parameters to sweep over
Sample metric every N timesteps
Use GPU for GP model training
Quantile threshold for early stopping
Maximum cost (timesteps) for suggestions
Running sweeps
Command line
# Run hyperparameter sweep
puffer sweep puffer_cartpole
# With custom config
puffer sweep puffer_cartpole --sweep.method Protein --sweep.metric episode_return
Python API
import pufferlib.pufferl as pufferl
# Load config with sweep settings
args = pufferl.load_config( 'puffer_cartpole' )
# Configure sweep
args[ 'sweep' ] = {
'method' : 'Protein' ,
'metric' : 'episode_return' ,
'goal' : 'maximize' ,
'downsample' : 100 ,
'train.learning_rate' : {
'distribution' : 'log' ,
'min' : 1e-5 ,
'max' : 1e-2
}
}
# Run sweep (requires wandb or neptune)
args[ 'wandb' ] = True
pufferl.sweep(args)
Sweep methods reference
Protein.suggest()
Suggest next hyperparameter configuration using Bayesian optimization.
Configuration dictionary to update with suggestions
Updates args in-place with suggested values
Protein.observe()
Record observation from a training run.
Dictionary of hyperparameter values
Cost (e.g., timesteps) of the run
Protein.should_stop()
Check if run should be stopped early.
True if run should be stopped
Examples
Basic sweep
import pufferlib.sweep as sweep
# Define parameter spaces
spaces = {
'learning_rate' : sweep.Log( 1e-5 , 1e-2 , 'auto' ),
'batch_size' : sweep.Pow2( 64 , 4096 , 'auto' , is_integer = True ),
'gamma' : sweep.Logit( 0.9 , 0.999 , 'auto' )
}
# Create Protein optimizer
optimizer = sweep.Protein({
'method' : 'Protein' ,
'metric' : 'episode_return' ,
'goal' : 'maximize' ,
** spaces
})
# Suggest and observe
params = {}
optimizer.suggest(params)
print ( f "Suggested params: { params } " )
# After training
optimizer.observe(params, metric = 150.0 , cost = 1e6 )
Multi-objective optimization
# Optimize for return while minimizing training time
config = {
'method' : 'Protein' ,
'metric' : 'episode_return' ,
'goal' : 'maximize' ,
'prune_pareto' : True , # Pareto front pruning
'max_suggestion_cost' : 5e6 , # Max 5M timesteps
'train.learning_rate' : {
'distribution' : 'log' ,
'min' : 1e-5 ,
'max' : 1e-2
},
'train.num_minibatches' : {
'distribution' : 'pow2' ,
'min' : 1 ,
'max' : 16 ,
'is_integer' : True
}
}
Sweeps require either wandb or neptune for tracking. Set args['wandb'] = True or args['neptune'] = True before running.
The Protein optimizer uses GPU-accelerated Gaussian processes. For large parameter spaces, consider using use_gpu=True and adequate GPU memory.