Masked Autoregressive Flow (MAF)

Overview

Masked Autoregressive Flow (MAF) uses masked neural networks to create autoregressive transformations. It’s a flexible and widely-used architecture that serves as the foundation for many other flows like NSF.

Reference

Masked Autoregressive Flow for Density Estimation (Papamakarios et al., 2017)
https://arxiv.org/abs/1705.07057

Class Definition

zuko.flows.MAF(
    features: int,
    context: int = 0,
    transforms: int = 3,
    randperm: bool = False,
    **kwargs
)

Parameters

features

int

required

The number of features in the data.

context

int

default:"0"

The number of context features for conditional density estimation.

transforms

int

default:"3"

The number of autoregressive transformations to stack.

randperm

bool

default:"False"

Whether features are randomly permuted between transformations. If False, features are in ascending order for even transformations and descending order for odd transformations.

**kwargs

dict

Additional keyword arguments passed to MaskedAutoregressiveTransform:

hidden_features: List of hidden layer sizes (default: [64, 64])
activation: Activation function (default: ReLU)
passes: Number of passes for the inverse (default: features for fully autoregressive)
univariate: The univariate transformation constructor (default: MonotonicAffineTransform)
shapes: Parameter shapes for univariate transformations

Usage Example

import torch
import zuko

# Create an unconditional MAF
flow = zuko.flows.MAF(
    features=3,
    transforms=3,
    hidden_features=[64, 64]
)

# Sample from the flow
dist = flow()
samples = dist.sample((1000,))
print(samples.shape)  # torch.Size([1000, 3])

# Compute log probabilities
log_prob = dist.log_prob(samples)
print(log_prob.shape)  # torch.Size([1000])

Conditional Flow

# Create a conditional MAF
flow = zuko.flows.MAF(
    features=3,
    context=4,
    transforms=3
)

# Example usage
c = torch.randn(4)
x = flow(c).sample()
log_p = flow(c).log_prob(x)

print(x.shape)      # torch.Size([3])
print(log_p.shape)  # torch.Size([])

Training Example

import torch
import torch.nn as nn
from torch.utils.data import DataLoader

# Create flow
flow = zuko.flows.MAF(
    features=10,
    transforms=5,
    hidden_features=[128, 128]
)

# Training loop
optimizer = torch.optim.Adam(flow.parameters(), lr=1e-3)

for epoch in range(100):
    for x in dataloader:
        optimizer.zero_grad()
        
        # Negative log-likelihood loss
        loss = -flow().log_prob(x).mean()
        
        loss.backward()
        optimizer.step()
    
    print(f"Epoch {epoch}, Loss: {loss.item():.4f}")

Coupling Transformations

# Use coupling instead of fully autoregressive
flow = zuko.flows.MAF(
    features=100,
    transforms=3,
    passes=2  # Coupling with 2 passes
)

Methods

`forward(c=None)`

Returns a normalizing flow distribution. Arguments:

c (Tensor, optional): Context tensor of shape (*, context)

Returns:

NormalizingFlow: A distribution object with:
- sample(shape): Sample from the distribution
- log_prob(x): Compute log probability of samples
- rsample(shape): Reparameterized sampling (supports gradients)

When to Use MAF

Good for:

General-purpose density estimation
Fast training (forward pass is parallel)
Flexible baseline for custom flows
When you need a simple, well-understood architecture

Limitations:

Slow sampling (inverse is sequential)
Less expressive than NSF or NAF for complex distributions
Affine transformations may be limiting for multimodal data

Tips

Number of transformations: Start with 3-5. More transformations increase expressivity but add computational cost.
Random permutations: Set randperm=True for better mixing when features have structure.
Hidden layer sizes: Use [128, 128] or [256, 256] for complex datasets.
Coupling for speed: Use passes=2 for faster inverse when you have many features.

Architecture Details

MAF consists of:

Base distribution: Diagonal Gaussian N(0, I)
Transformations: Affine transformations with autoregressive conditioning
Neural network: Masked MLP that ensures autoregressive structure
Parameters per feature: 2 (location and scale)

Each transformation computes:

y_i = x_i * exp(s_i) + t_i

where s_i (scale) and t_i (translation) depend on x_1, ..., x_{i-1} and context c.

Comparison with Other Flows

Property	MAF	RealNVP	NSF
Training	Fast	Fast	Medium
Sampling	Slow	Fast	Slow
Expressivity	Medium	Medium	High
Complexity	Low	Low	Medium

Advanced Usage

Custom Univariate Transformations

from zuko.transforms import MonotonicAffineTransform

# MAF uses affine transformations by default
flow = zuko.flows.MAF(
    features=5,
    univariate=MonotonicAffineTransform,
    shapes=[(), ()]  # Shapes for (shift, log_scale)
)

Custom Masking

import torch

# Define custom feature ordering
order = torch.tensor([2, 0, 1, 3, 4])

flow = zuko.flows.MAF(
    features=5,
    transforms=3,
    order=order
)

NSF - MAF with spline transformations
NAF - MAF with neural transformations
MaskedAutoregressiveTransform - The underlying transformation

Flows

Core Components

Distributions

Transforms

Utilities

Masked Autoregressive Flow (MAF)

Overview

Reference

Class Definition

Parameters

Usage Example

Conditional Flow

Training Example

Coupling Transformations

Methods

`forward(c=None)`

When to Use MAF

Tips

Architecture Details

Comparison with Other Flows

Advanced Usage

Custom Univariate Transformations

Custom Masking

Build docs developers (and LLMs) love

Flows

Core Components

Distributions

Transforms

Utilities

​Overview

​Reference

​Class Definition

​Parameters

​Usage Example

​Conditional Flow

​Training Example

​Coupling Transformations

​Methods

​forward(c=None)

​When to Use MAF

​Tips

​Architecture Details

​Comparison with Other Flows

​Advanced Usage

​Custom Univariate Transformations

​Custom Masking

​Related

Build docs developers (and LLMs) love

Overview

Reference

Class Definition

Parameters

Usage Example

Conditional Flow

Training Example

Coupling Transformations

Methods

`forward(c=None)`

When to Use MAF

Tips

Architecture Details

Comparison with Other Flows

Advanced Usage

Custom Univariate Transformations

Custom Masking

Related