Overview
Neural Spline Flow (NSF) uses monotonic rational-quadratic spline transformations to create highly expressive normalizing flows. The splines provide smooth, invertible transformations that can model complex distributions.Spline transformations are defined over the domain
[-5, 5]. Features outside this range are not transformed. It is recommended to standardize features (zero mean, unit variance) before training.Reference
Neural Spline Flows (Durkan et al., 2019)https://arxiv.org/abs/1906.04032
Class Definition
Parameters
The number of features in the data.
The number of context features for conditional density estimation.
The number of bins K in the rational-quadratic spline. More bins allow for more complex transformations but increase computational cost.
The number of autoregressive transformations to stack. More transformations increase expressivity.
Whether features are randomly permuted between transformations. If
False, features alternate between ascending and descending order.Additional keyword arguments passed to
MaskedAutoregressiveTransform, such as:hidden_features: List of hidden layer sizes (default:[64, 64])activation: Activation function (default:ReLU)passes: Number of passes for coupling (default:featuresfor fully autoregressive)
Usage Example
Conditional Flow
Coupling Transformations
Methods
forward(c=None)
Returns a normalizing flow distribution.
Arguments:
c(Tensor, optional): Context tensor of shape(*, context)
NormalizingFlow: A distribution with the following methods:sample(shape): Sample from the distributionlog_prob(x): Compute log probability of samplesrsample(shape): Reparameterized sampling (supports gradients)
When to Use NSF
Good for:
- General-purpose density estimation
- Complex, multimodal distributions
- Smooth, continuous data
- When you need high expressivity
- You need very fast sampling (use RealNVP)
- Your features are outside
[-5, 5]and can’t be standardized - You have limited compute (use MAF with fewer bins)
Tips
- Standardize your data: NSF works best when features are normalized to have zero mean and unit variance.
- Tune the number of bins: Start with 8-16 bins. More bins = more expressivity but slower.
- Adjust transformations: Use 3-5 transformations for most tasks. More helps for very complex distributions.
-
Use coupling for high dimensions: Set
passes=2when features > 50 for faster computation.
Architecture Details
NSF is built on top of Masked Autoregressive Flow (MAF) with rational-quadratic spline transformations:- Base distribution: Diagonal Gaussian
N(0, I) - Transformation: Monotonic rational-quadratic splines with
binssegments - Neural network: Masked MLP that predicts spline parameters autoregressively
- Parameters per feature:
3 * bins - 1(widths, heights, derivatives)
Related
- MAF - The underlying autoregressive architecture
- NCSF - Circular variant for periodic data
- MonotonicRQSTransform - The spline transformation
