Overview
Masked Autoregressive Flow (MAF) uses masked neural networks to create autoregressive transformations. It’s a flexible and widely-used architecture that serves as the foundation for many other flows like NSF.Reference
Masked Autoregressive Flow for Density Estimation (Papamakarios et al., 2017)https://arxiv.org/abs/1705.07057
Class Definition
Parameters
The number of features in the data.
The number of context features for conditional density estimation.
The number of autoregressive transformations to stack.
Whether features are randomly permuted between transformations. If
False, features are in ascending order for even transformations and descending order for odd transformations.Additional keyword arguments passed to
MaskedAutoregressiveTransform:hidden_features: List of hidden layer sizes (default:[64, 64])activation: Activation function (default:ReLU)passes: Number of passes for the inverse (default:featuresfor fully autoregressive)univariate: The univariate transformation constructor (default:MonotonicAffineTransform)shapes: Parameter shapes for univariate transformations
Usage Example
Conditional Flow
Training Example
Coupling Transformations
Methods
forward(c=None)
Returns a normalizing flow distribution.
Arguments:
c(Tensor, optional): Context tensor of shape(*, context)
NormalizingFlow: A distribution object with:sample(shape): Sample from the distributionlog_prob(x): Compute log probability of samplesrsample(shape): Reparameterized sampling (supports gradients)
When to Use MAF
Good for:
- General-purpose density estimation
- Fast training (forward pass is parallel)
- Flexible baseline for custom flows
- When you need a simple, well-understood architecture
- Slow sampling (inverse is sequential)
- Less expressive than NSF or NAF for complex distributions
- Affine transformations may be limiting for multimodal data
Tips
- Number of transformations: Start with 3-5. More transformations increase expressivity but add computational cost.
-
Random permutations: Set
randperm=Truefor better mixing when features have structure. -
Hidden layer sizes: Use
[128, 128]or[256, 256]for complex datasets. -
Coupling for speed: Use
passes=2for faster inverse when you have many features.
Architecture Details
MAF consists of:- Base distribution: Diagonal Gaussian
N(0, I) - Transformations: Affine transformations with autoregressive conditioning
- Neural network: Masked MLP that ensures autoregressive structure
- Parameters per feature: 2 (location and scale)
s_i (scale) and t_i (translation) depend on x_1, ..., x_{i-1} and context c.
Comparison with Other Flows
| Property | MAF | RealNVP | NSF |
|---|---|---|---|
| Training | Fast | Fast | Medium |
| Sampling | Slow | Fast | Slow |
| Expressivity | Medium | Medium | High |
| Complexity | Low | Low | Medium |
Advanced Usage
Custom Univariate Transformations
Custom Masking
Related
- NSF - MAF with spline transformations
- NAF - MAF with neural transformations
- MaskedAutoregressiveTransform - The underlying transformation
