Sweep

The sweep module provides Bayesian optimization for hyperparameter tuning using Gaussian processes and acquisition functions.

Sweep methods

PufferLib supports multiple sweep methods for hyperparameter optimization:

Protein

Bayesian optimization with GP models and acquisition functions

GridSearch

Exhaustive grid search over parameter space

RandomSearch

Random sampling from parameter distributions

Sobol

Quasi-random Sobol sequences for efficient sampling

Space classes

Define search spaces for hyperparameters:

Linear

Linear spacing between min and max values.

min

float

required

Minimum value of the parameter

max

float

required

Maximum value of the parameter

scale

float | 'auto'

default:"'auto'"

Scaling factor for the parameter (default: 0.5)

is_integer

bool

default:"False"

Whether to round values to integers

from pufferlib.sweep import Linear

# Continuous linear space
batch_size = Linear(min=64, max=1024, scale='auto', is_integer=True)

# Float linear space
learning_rate = Linear(min=0.0001, max=0.01, scale='auto')

Log

Logarithmic spacing for parameters that vary over orders of magnitude.

min

float

required

Minimum value (must be > 0)

max

float

required

Maximum value (must be > 0)

scale

float | 'auto' | 'time'

default:"'auto'"

Scaling factor. Use ‘time’ for time-based scaling.

from pufferlib.sweep import Log

# Learning rate with log scaling
lr = Log(min=1e-5, max=1e-2, scale='auto')

# Entropy coefficient
ent_coef = Log(min=0.0001, max=0.1, scale='time')

Pow2

Power-of-2 spacing for batch sizes and similar parameters.

from pufferlib.sweep import Pow2

# Batch size in powers of 2
batch = Pow2(min=64, max=4096, scale='auto', is_integer=True)

Logit

Logit-transformed spacing for parameters in (0, 1) range.

from pufferlib.sweep import Logit

# Discount factor
gamma = Logit(min=0.9, max=0.999, scale='auto')

# GAE lambda
gae_lambda = Logit(min=0.8, max=0.99, scale='auto')

Protein optimizer

Bayesian optimization using Gaussian processes.

Configuration

sweep_config = {
    'method': 'Protein',
    'metric': 'episode_return',
    'goal': 'maximize',
    'sweep_only': 'train.learning_rate,train.clip_coef',
    'downsample': 100,  # Sample every N timesteps
    'use_gpu': True,
    'early_stop_quantile': 0.1,
    'max_suggestion_cost': 1e7,
    
    # Parameter search spaces
    'train.learning_rate': {
        'distribution': 'log',
        'min': 1e-5,
        'max': 1e-2,
        'scale': 'auto'
    },
    'train.clip_coef': {
        'distribution': 'linear',
        'min': 0.1,
        'max': 0.5,
        'scale': 'auto'
    }
}

method

str

required

Optimization method: ‘Protein’, ‘GridSearch’, ‘RandomSearch’, or ‘Sobol’

metric

str

required

Metric to optimize (e.g., ‘episode_return’, ‘episode_length’)

goal

str

default:"'maximize'"

Optimization goal: ‘maximize’ or ‘minimize’

sweep_only

str

Comma-separated list of parameters to sweep over

downsample

int

default:"100"

Sample metric every N timesteps

use_gpu

bool

default:"True"

Use GPU for GP model training

early_stop_quantile

float

default:"0.1"

Quantile threshold for early stopping

max_suggestion_cost

float

Maximum cost (timesteps) for suggestions

Running sweeps

Command line

# Run hyperparameter sweep
puffer sweep puffer_cartpole

# With custom config
puffer sweep puffer_cartpole --sweep.method Protein --sweep.metric episode_return

Python API

import pufferlib.pufferl as pufferl

# Load config with sweep settings
args = pufferl.load_config('puffer_cartpole')

# Configure sweep
args['sweep'] = {
    'method': 'Protein',
    'metric': 'episode_return',
    'goal': 'maximize',
    'downsample': 100,
    'train.learning_rate': {
        'distribution': 'log',
        'min': 1e-5,
        'max': 1e-2
    }
}

# Run sweep (requires wandb or neptune)
args['wandb'] = True
pufferl.sweep(args)

Sweep methods reference

Protein.suggest()

Suggest next hyperparameter configuration using Bayesian optimization.

args

dict

required

Configuration dictionary to update with suggestions

None

Updates args in-place with suggested values

Protein.observe()

Record observation from a training run.

params

dict

required

Dictionary of hyperparameter values

metric

float

required

Observed metric value

cost

float

required

Cost (e.g., timesteps) of the run

Protein.should_stop()

Check if run should be stopped early.

metric

float

required

Current metric value

cost

float

required

Current cost

bool

True if run should be stopped

Examples

Basic sweep

import pufferlib.sweep as sweep

# Define parameter spaces
spaces = {
    'learning_rate': sweep.Log(1e-5, 1e-2, 'auto'),
    'batch_size': sweep.Pow2(64, 4096, 'auto', is_integer=True),
    'gamma': sweep.Logit(0.9, 0.999, 'auto')
}

# Create Protein optimizer
optimizer = sweep.Protein({
    'method': 'Protein',
    'metric': 'episode_return',
    'goal': 'maximize',
    **spaces
})

# Suggest and observe
params = {}
optimizer.suggest(params)
print(f"Suggested params: {params}")

# After training
optimizer.observe(params, metric=150.0, cost=1e6)

Multi-objective optimization

# Optimize for return while minimizing training time
config = {
    'method': 'Protein',
    'metric': 'episode_return',
    'goal': 'maximize',
    'prune_pareto': True,  # Pareto front pruning
    'max_suggestion_cost': 5e6,  # Max 5M timesteps
    
    'train.learning_rate': {
        'distribution': 'log',
        'min': 1e-5,
        'max': 1e-2
    },
    'train.num_minibatches': {
        'distribution': 'pow2',
        'min': 1,
        'max': 16,
        'is_integer': True
    }
}

Sweeps require either wandb or neptune for tracking. Set args['wandb'] = True or args['neptune'] = True before running.

The Protein optimizer uses GPU-accelerated Gaussian processes. For large parameter spaces, consider using use_gpu=True and adequate GPU memory.

Core API

Training

Emulation

Utilities

Sweep methods

Protein

GridSearch

RandomSearch

Sobol

Space classes

Linear

Log

Pow2

Logit

Protein optimizer

Configuration

Running sweeps

Command line

Python API

Sweep methods reference

Protein.suggest()

Protein.observe()

Protein.should_stop()

Examples

Basic sweep

Multi-objective optimization

Build docs developers (and LLMs) love

Core API

Training

Emulation

Utilities

​Sweep methods

Protein

GridSearch

RandomSearch

Sobol

​Space classes

​Linear

​Log

​Pow2

​Logit

​Protein optimizer

​Configuration

​Running sweeps

​Command line

​Python API

​Sweep methods reference

​Protein.suggest()

​Protein.observe()

​Protein.should_stop()

​Examples

​Basic sweep

​Multi-objective optimization

Build docs developers (and LLMs) love

Sweep methods

Space classes

Linear

Log

Pow2

Logit

Protein optimizer

Configuration

Running sweeps

Command line

Python API

Sweep methods reference

Protein.suggest()

Protein.observe()

Protein.should_stop()

Examples

Basic sweep

Multi-objective optimization