zuko.bayesian module provides utilities for Bayesian deep learning, enabling variational inference over model parameters using mean-field Gaussian posteriors.
BayesianModel
Creates a Bayesian wrapper around a base model that maintains a variational posterior over parameters. The posterior is a mean-field Gaussian factorization: where and are learned variational parameters.A base PyTorch model
Initial value for log-variance parameters (controls initialization uncertainty)
List of parameter name prefixes to include in the posterior. Use
* to match alphanumeric strings and ** to match dot-separated paths. By default, all parameters are included.List of parameter name prefixes to exclude from the posterior
Methods
sample_params
Returns model parameters sampled from the posterior. Returns: Dictionary mapping parameter names to sampled tensorssample_model
Returns a standalone model sampled from the posterior. The returned model is a deep copy of the base model with sampled parameters. It can be used independently but does not propagate gradients to the Bayesian model. Returns: A sampled model instanceWarning:
sample_model() should not be used during training as gradients do not flow back to the variational parameters. Use reparameterize() instead for training.reparameterize
Context manager that temporarily reparameterizes the base model from the posterior. Within this context, the base model behaves deterministically (same inputs → same outputs) and gradients flow through the variational parameters.Whether to use the local reparameterization trick for linear layers, which reduces variance in gradient estimates
Local Reparameterization Trick:When
local_trick=True, instead of sampling weights and then computing outputs, the method samples activations directly from their induced distribution. This reduces gradient variance.Reference: Variational Dropout and the Local Reparameterization Trick (Kingma et al., 2015) - arxiv.org/abs/1506.02557kl_divergence
Computes the KL divergence between the posterior and a Gaussian prior:The variance of the Gaussian prior
Usage Example
Basic Variational Inference
Multiple Posterior Samples
Selective Parameter Inference
Local Reparameterization Trick
Bayesian Normalizing Flows
Combining Bayesian inference with normalizing flows enables uncertainty quantification over flow parameters:Pattern Matching
Theinclude_params and exclude_params arguments support glob-like patterns:
"layer1"- Matches parameters starting with “layer1”"*.weight"- Matches all weight parameters (single wildcard*matches[a-zA-Z0-9_]+)"encoder.**"- Matches all parameters under “encoder” module (double wildcard**matches[a-zA-Z0-9_\.]+)"encoder.*.bias"- Matches biases in direct children of encoder
Notes
ELBO Optimization:The evidence lower bound (ELBO) for variational inference is:In practice, maximize ELBO by minimizing:
nll + kl_divergence() / num_datapointsThe KL term is scaled by the dataset size to balance the two terms appropriately.Memory Efficiency:The
reparameterize() context manager temporarily replaces parameters without creating permanent copies, making it memory-efficient for large models.