Unconstrained Neural Autoregressive Flow (UNAF)

Overview

Unconstrained Neural Autoregressive Flow (UNAF) uses unconstrained monotonic neural networks (UMNN) based on integration rather than positive weights. This allows for more flexible monotonic transformations without architectural constraints.

Invertibility is only guaranteed for features within the interval [-10, 10]. It is recommended to standardize features (zero mean, unit variance) before training.

Reference

Unconstrained Monotonic Neural Networks (Wehenkel et al., 2019)
https://arxiv.org/abs/1908.05164

Class Definition

zuko.flows.UNAF(
    features: int,
    context: int = 0,
    transforms: int = 3,
    randperm: bool = False,
    signal: int = 16,
    network: dict = {},
    **kwargs
)

Parameters

features

int

required

The number of features in the data.

context

int

default:"0"

The number of context features for conditional density estimation.

transforms

int

default:"3"

The number of autoregressive transformations to stack.

randperm

bool

default:"False"

Whether features are randomly permuted between transformations. If False, features alternate between ascending and descending order.

signal

int

default:"16"

The number of signal features for the integrand neural network.

network

dict

default:"{}"

Keyword arguments passed to the UMNN (unconstrained monotonic neural network) constructor:

hidden_features: Hidden layer sizes for the integrand network
activation: Activation function (default: ELU)

**kwargs

dict

Additional keyword arguments passed to MaskedAutoregressiveTransform:

hidden_features: Hidden layer sizes for the autoregressive network
activation: Activation function

Usage Example

import torch
import zuko

# Create an unconditional UNAF
flow = zuko.flows.UNAF(
    features=5,
    transforms=5,
    signal=32,
    hidden_features=[128, 128]
)

# Sample from the flow
dist = flow()
samples = dist.sample((1000,))
print(samples.shape)  # torch.Size([1000, 5])

# Compute log probabilities
log_prob = dist.log_prob(samples)
print(log_prob.shape)  # torch.Size([1000])

Conditional Flow

# Create a conditional UNAF
flow = zuko.flows.UNAF(
    features=3,
    context=5,
    transforms=5,
    signal=24
)

context = torch.randn(5)
dist = flow(context)
samples = dist.sample((100,))

Training Example

import torch.optim as optim

flow = zuko.flows.UNAF(
    features=10,
    transforms=5,
    signal=32,
    hidden_features=[256, 256],
    network={'hidden_features': [64, 64, 64], 'activation': torch.nn.ELU}
)

optimizer = optim.Adam(flow.parameters(), lr=1e-3)

for epoch in range(100):
    for x in dataloader:
        optimizer.zero_grad()
        
        # Ensure data is in [-10, 10]
        x = torch.clamp(x, -10, 10)
        
        loss = -flow().log_prob(x).mean()
        loss.backward()
        optimizer.step()

Methods

`forward(c=None)`

Returns a normalizing flow distribution. Arguments:

c (Tensor, optional): Context tensor of shape (*, context)

Returns:

NormalizingFlow: A distribution with:
- sample(shape): Sample from the distribution
- log_prob(x): Compute log probability of samples
- rsample(shape): Reparameterized sampling

When to Use UNAF

Good for:

Maximum flexibility in monotonic transformations
Complex, highly nonlinear distributions
Research and experimentation
When architectural constraints are limiting

Consider alternatives if:

You need fast training (use NAF or NSF)
You need fast sampling (use RealNVP)
Your data is outside [-10, 10] and can’t be standardized
You want simpler, more interpretable models (use MAF or NSF)

Tips

Standardize your data: UNAF requires features in [-10, 10]. Always normalize inputs.
Use ELU activation: UNAF uses ELU activation by default, which works well with integration.
Tune signal dimension: Start with signal=16. Increase for more complex distributions.
Be patient: UNAF can be slow to train due to integration but is very expressive.

Architecture Details

UNAF uses integration to create monotonic transformations:

Base distribution: Diagonal Gaussian N(0, I)
Transformation: Integration of positive integrand functions
Signal network: Masked MLP predicts signal vectors and constants
Integrand network: Unconstrained MLP whose integral defines the transformation
Softclip layers: Inserted between transformations

Each transformation:

y_i = ∫[0 to x_i] g(t; signal_i) dt + constant_i

where g is the integrand network (always positive via exp transformation) and signal_i, constant_i are predicted autoregressively.

Unconstrained Monotonic Networks

Key differences from NAF:

No weight constraints: Weights can be any value
Integration-based: Monotonicity via integration, not positive weights
Integrand function: Models dy/dx instead of y directly
Numerical integration: Uses ODE solvers for inversion

# Simplified UMNN concept
class UMNN(nn.Module):
    def __init__(self, signal_dim):
        # Unconstrained network
        self.integrand = MLP(1 + signal_dim, 1)
    
    def forward(self, x, signal, constant):
        # Integrate from 0 to x
        def g(t):
            inp = torch.cat([t, signal], dim=-1)
            return exp(self.integrand(inp))  # Always positive
        
        integral = integrate(g, 0, x)
        return integral + constant

UNAF vs NAF

Property	UNAF	NAF
Weights	Unconstrained	Positive only
Method	Integration	Direct evaluation
Flexibility	Higher	Lower
Training speed	Slower	Faster
Inversion	Numerical	Numerical
Stability	Good	Good

Integration Details

UNAF uses numerical integration:

Forward pass: Integrate from 0 to x
Inverse pass: Root finding to solve integral equation
Log determinant: Computed from integrand evaluations

The integrand is made positive:

g(t) = exp(network(t, signal) / (1 + abs(network(t, signal) / 7)))

This keeps g(t) in the range [1e-3, 1e3] for numerical stability.

Advanced Usage

Custom Integrand Network

flow = zuko.flows.UNAF(
    features=10,
    transforms=5,
    signal=32,
    network={
        'hidden_features': [128, 128, 128, 128],
        'activation': torch.nn.Tanh  # Custom activation
    }
)

High-Dimensional Data

# Use coupling for efficiency
flow = zuko.flows.UNAF(
    features=100,
    transforms=3,
    passes=2,  # Coupling transformations
    signal=32
)

Manual Construction

from zuko.flows.autoregressive import MaskedAutoregressiveTransform
from zuko.flows.neural import UMNN

transform = MaskedAutoregressiveTransform(
    features=10,
    context=5,
    univariate=UMNN(
        signal=32,
        hidden_features=[64, 64, 64],
        activation=torch.nn.ELU
    ),
    shapes=[(32,), ()],  # Signal and constant shapes
    hidden_features=[256, 256]
)

Computational Considerations

UNAF is computationally intensive:

Training: Slower than NAF due to integration
Memory: Higher due to integration state
Inversion: Requires numerical root finding
Gradients: Computed via adjoint method

Optimization strategies:

Use smaller networks for the integrand
Reduce signal dimensions
Use coupling for high-dimensional data
Adjust integration tolerances

Comparison with Other Flows

Property	UNAF	NAF	NSF	MAF
Expressivity	Very high	Very high	High	Medium
Flexibility	Highest	High	Medium	Low
Training speed	Very slow	Slow	Medium	Fast
Sampling speed	Slow	Slow	Slow	Slow
Implementation	Complex	Medium	Medium	Simple

Research Applications

UNAF is particularly useful for:

Research: Exploring limits of flow expressivity
Benchmarking: Comparing against other architectures
Complex distributions: Multi-modal, long-tailed, irregular
Ablation studies: Understanding monotonic transformations

NAF - Constrained monotonic neural networks
NSF - Spline-based alternative
UnconstrainedMonotonicTransform - The underlying transformation
MAF - Simpler baseline

Flows

Core Components

Distributions

Transforms

Utilities

Unconstrained Neural Autoregressive Flow (UNAF)

Overview

Reference

Class Definition

Parameters

Usage Example

Conditional Flow

Training Example

Methods

`forward(c=None)`

When to Use UNAF

Tips

Architecture Details

Unconstrained Monotonic Networks

UNAF vs NAF

Integration Details

Advanced Usage

Custom Integrand Network

High-Dimensional Data

Manual Construction

Computational Considerations

Comparison with Other Flows

Research Applications

Build docs developers (and LLMs) love

Flows

Core Components

Distributions

Transforms

Utilities

​Overview

​Reference

​Class Definition

​Parameters

​Usage Example

​Conditional Flow

​Training Example

​Methods

​forward(c=None)

​When to Use UNAF

​Tips

​Architecture Details

​Unconstrained Monotonic Networks

​UNAF vs NAF

​Integration Details

​Advanced Usage

​Custom Integrand Network

​High-Dimensional Data

​Manual Construction

​Computational Considerations

​Comparison with Other Flows

​Research Applications

​Related

Build docs developers (and LLMs) love

Overview

Reference

Class Definition

Parameters

Usage Example

Conditional Flow

Training Example

Methods

`forward(c=None)`

When to Use UNAF

Tips

Architecture Details

Unconstrained Monotonic Networks

UNAF vs NAF

Integration Details

Advanced Usage

Custom Integrand Network

High-Dimensional Data

Manual Construction

Computational Considerations

Comparison with Other Flows

Research Applications

Related