Overview
Neural Autoregressive Flow (NAF) uses monotonic neural networks (MNN) to create universal function approximators for autoregressive transformations. Unlike MAF which uses simple affine transformations, NAF can represent arbitrary monotonic functions.Reference
Neural Autoregressive Flows (Huang et al., 2018)https://arxiv.org/abs/1804.00779
Class Definition
Parameters
The number of features in the data.
The number of context features for conditional density estimation.
The number of autoregressive transformations to stack.
Whether features are randomly permuted between transformations. If
False, features alternate between ascending and descending order.The number of signal features for the monotonic neural network. Higher values increase expressivity but add computational cost.
Keyword arguments passed to the
MNN (monotonic neural network) constructor:hidden_features: Hidden layer sizes for the monotonic networkactivation: Activation function
Additional keyword arguments passed to
MaskedAutoregressiveTransform:hidden_features: Hidden layer sizes for the autoregressive networkactivation: Activation function
Usage Example
Conditional Flow
Training Example
Methods
forward(c=None)
Returns a normalizing flow distribution.
Arguments:
c(Tensor, optional): Context tensor of shape(*, context)
NormalizingFlow: A distribution with:sample(shape): Sample from the distributionlog_prob(x): Compute log probability of samplesrsample(shape): Reparameterized sampling
When to Use NAF
Good for:
- Complex, highly nonlinear distributions
- High-dimensional data
- When you need universal approximation
- Maximum expressivity in autoregressive flows
- You need fast sampling (use RealNVP)
- Your data is outside
[-10, 10]and can’t be standardized - You have limited compute (use MAF or NSF)
- You need smooth, well-behaved transformations (use NSF)
Tips
-
Standardize your data: NAF requires features in
[-10, 10]. Always normalize to zero mean and unit variance. -
Tune signal dimension: Start with
signal=16. Increase to 32 or 64 for more complex data. -
Use softclip: NAF automatically includes
SoftclipTransformlayers between transformations to keep values bounded. -
Balance network sizes: The autoregressive network (
hidden_features) predicts signals, while the monotonic network (network['hidden_features']) performs transformations.
Architecture Details
NAF consists of:- Base distribution: Diagonal Gaussian
N(0, I) - Transformation: Monotonic neural networks with autoregressive structure
- Signal network: Masked MLP predicts signal vectors autoregressively
- Monotonic network: MLP with positive weights computes transformations
- Softclip layers: Inserted between transformations to maintain bounds
MNN is a monotonic neural network and signal_i is predicted autoregressively.
Monotonic Neural Networks
The key innovation in NAF is the use of monotonic neural networks:- Positive weights: All weights in the network are positive, ensuring monotonicity
- Flexible: Can approximate any continuous monotonic function
- Signal-based: Behavior is modulated by signal vectors rather than changing weights
NAF vs Other Flows
| Property | NAF | MAF | NSF |
|---|---|---|---|
| Transformation | Neural network | Affine | Spline |
| Expressivity | Very high | Medium | High |
| Training speed | Slow | Fast | Medium |
| Sampling speed | Slow | Slow | Slow |
| Memory usage | High | Low | Medium |
| Domain | [-10, 10] | Unbounded | [-5, 5] |
Advanced Usage
Custom Monotonic Network
High-Dimensional Data
Fine-Grained Control
Computational Considerations
NAF is computationally expensive:- Parameters: More than MAF due to monotonic networks
- Forward pass: Slower due to neural network evaluations
- Memory: Higher due to signal vectors and network activations
- Use smaller signal dimensions (8-16)
- Use coupling (
passes=2) for high dimensions - Reduce monotonic network depth
- Use mixed precision training
Related
- UNAF - Unconstrained variant with integration
- MAF - Simpler affine alternative
- NSF - Spline-based alternative
- MonotonicTransform - The underlying transformation
