Custom Flows

This guide shows you how to create custom normalizing flow architectures by composing transformations, using different base distributions, and building complex conditional models.

Understanding Flow Components

A normalizing flow in Zuko consists of two main components:

Transformations: Bijective functions that map between spaces
Base Distribution: The starting distribution (usually Gaussian)

from zuko.flows import Flow
from zuko.lazy import UnconditionalDistribution, UnconditionalTransform

flow = Flow(
    transform=[...],  # List of transformations
    base=...          # Base distribution
)

Building Custom Flows

Simple Custom Flow

Here’s a basic example combining affine and rotation transformations:

import torch
import zuko
from zuko.flows import Flow, MaskedAutoregressiveTransform
from zuko.lazy import UnconditionalDistribution, UnconditionalTransform
from zuko.transforms import RotationTransform
from zuko.distributions import DiagNormal

flow = Flow(
    transform=[
        MaskedAutoregressiveTransform(
            features=3,
            context=5,
            hidden_features=(64, 64)
        ),
        UnconditionalTransform(
            RotationTransform,
            torch.randn(3, 3)  # Random initialization
        ),
        MaskedAutoregressiveTransform(
            features=3,
            context=5,
            hidden_features=(64, 64)
        ),
    ],
    base=UnconditionalDistribution(
        DiagNormal,
        loc=torch.zeros(3),
        scale=torch.ones(3),
        buffer=True  # Not trainable
    )
)

This example is similar to the custom flow shown in the README. The transformations are applied in sequence: f_3 ∘ rotation ∘ f_1.

Transformation Types

Masked Autoregressive Transform

The most flexible transformation for building expressive flows:

from zuko.flows import MaskedAutoregressiveTransform
from zuko.transforms import MonotonicAffineTransform

transform = MaskedAutoregressiveTransform(
    features=5,
    context=8,
    univariate=MonotonicAffineTransform,  # Element-wise transformation
    hidden_features=(128, 128, 128),      # Hyper-network architecture
    activation=torch.nn.ReLU              # Activation function
)

Coupling Transform

Faster but less expressive than autoregressive:

from zuko.flows import GeneralCouplingTransform
from zuko.transforms import MonotonicRQSTransform

transform = GeneralCouplingTransform(
    features=5,
    context=8,
    univariate=MonotonicRQSTransform,  # Rational-quadratic spline
    shapes=([8], [8], [7]),            # Parameters for 8 bins
    hidden_features=(256, 256)
)

Spline Transformations

For highly flexible element-wise transformations:

from zuko.transforms import MonotonicRQSTransform

# Used as univariate transform in MAT or coupling
transform = MaskedAutoregressiveTransform(
    features=3,
    context=0,
    univariate=MonotonicRQSTransform,
    shapes=([16], [16], [15]),  # 16 bins for higher capacity
    hidden_features=(128, 128)
)

Unconditional Transformations

Fixed Transformations

Add non-trainable transformations:

from zuko.transforms import AffineTransform, SigmoidTransform

flow = Flow(
    transform=[
        # Map [0, 255] to (0, 1)
        UnconditionalTransform(
            AffineTransform,
            torch.tensor(1/512),   # loc
            torch.tensor(1/256),   # scale
            buffer=True            # Not trainable
        ),
        # Map (0, 1) to (-inf, inf)
        UnconditionalTransform(
            lambda: SigmoidTransform().inv
        ),
        # ... more transforms
    ],
    base=...
)

Trainable Linear Transformations

from zuko.transforms import RotationTransform

flow = Flow(
    transform=[
        MaskedAutoregressiveTransform(...),
        UnconditionalTransform(
            RotationTransform,
            torch.randn(5, 5)  # Trainable rotation matrix
        ),
        MaskedAutoregressiveTransform(...),
    ],
    base=...
)

Adding rotation transforms between autoregressive layers can improve expressivity by mixing features.

Custom Base Distributions

Standard Gaussian

The most common choice:

from zuko.distributions import DiagNormal

base = UnconditionalDistribution(
    DiagNormal,
    loc=torch.zeros(features),
    scale=torch.ones(features),
    buffer=True
)

Uniform Distribution

Useful for bounded domains:

from zuko.distributions import BoxUniform

base = UnconditionalDistribution(
    BoxUniform,
    torch.full([features], -3.0),  # lower bound
    torch.full([features], +3.0),  # upper bound
    buffer=True
)

Learnable Base Distribution

# Learnable mean and variance
base = UnconditionalDistribution(
    DiagNormal,
    loc=torch.randn(features),      # Trainable
    scale=torch.ones(features),     # Trainable
    buffer=False  # Make trainable
)

Advanced Architectures

Multi-Scale Architecture

For high-dimensional data, split features across transformations:

from zuko.flows import Flow, MaskedAutoregressiveTransform
from zuko.transforms import PermutationTransform

features = 64

flow = Flow(
    transform=[
        # First scale
        MaskedAutoregressiveTransform(features, 0, hidden_features=(256,)),
        UnconditionalTransform(PermutationTransform, torch.randperm(features)),
        
        # Second scale
        MaskedAutoregressiveTransform(features, 0, hidden_features=(256,)),
        UnconditionalTransform(PermutationTransform, torch.randperm(features)),
        
        # Third scale
        MaskedAutoregressiveTransform(features, 0, hidden_features=(256,)),
    ],
    base=UnconditionalDistribution(
        DiagNormal,
        torch.zeros(features),
        torch.ones(features),
        buffer=True
    )
)

Residual Flows

Combine coupling and autoregressive transforms:

from zuko.flows import GeneralCouplingTransform

flow = Flow(
    transform=[
        GeneralCouplingTransform(features=5, context=0, hidden_features=(128,)),
        MaskedAutoregressiveTransform(features=5, context=0, hidden_features=(128,)),
        GeneralCouplingTransform(features=5, context=0, hidden_features=(128,)),
        MaskedAutoregressiveTransform(features=5, context=0, hidden_features=(128,)),
    ],
    base=...
)

Conditional Flows

Context-Dependent Flows

All transformations can accept context:

flow = Flow(
    transform=[
        MaskedAutoregressiveTransform(
            features=3,
            context=5,  # Context dimension
            hidden_features=(128, 128)
        ),
    ],
    base=UnconditionalDistribution(...)  # Base ignores context
)

# Use with context
c = torch.randn(batch_size, 5)
dist = flow(c)
x = dist.sample()

Conditional Base Distribution

For conditional base distributions, create a custom LazyDistribution:

import torch.nn as nn
from torch.distributions import Independent, Normal
from zuko.lazy import LazyDistribution

class ConditionalBase(LazyDistribution):
    def __init__(self, features: int, context: int):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(context, 64),
            nn.ReLU(),
            nn.Linear(64, 2 * features)
        )
    
    def forward(self, c):
        params = self.net(c)
        mu, log_sigma = params.chunk(2, dim=-1)
        return Independent(Normal(mu, log_sigma.exp()), 1)

flow = Flow(
    transform=[...],
    base=ConditionalBase(features=3, context=5)
)

Inverse Transforms

Easily create inverse transformations:

transform = MaskedAutoregressiveTransform(features=5, context=0)

flow = Flow(
    transform=[
        transform,
        transform.inv,  # Inverse transformation
        transform,
    ],
    base=...
)

Be careful with inverse transforms in autoregressive flows. The inverse of MAF becomes IAF (Inverse Autoregressive Flow), which has different computational properties.

Complete Custom Example

Here’s a sophisticated custom flow combining multiple techniques:

import torch
import zuko
from zuko.flows import Flow, MaskedAutoregressiveTransform, GeneralCouplingTransform
from zuko.lazy import UnconditionalDistribution, UnconditionalTransform
from zuko.transforms import (
    AffineTransform,
    MonotonicRQSTransform,
    RotationTransform,
    SigmoidTransform,
)
from zuko.distributions import BoxUniform

flow = Flow(
    transform=[
        # Preprocessing: [0, 255] -> R
        UnconditionalTransform(
            AffineTransform,
            torch.tensor(1/512),
            torch.tensor(1/256),
            buffer=True
        ),
        UnconditionalTransform(lambda: SigmoidTransform().inv),
        
        # First autoregressive block with splines
        MaskedAutoregressiveTransform(
            features=5,
            context=8,
            univariate=MonotonicRQSTransform,
            shapes=([8], [8], [7]),  # 8 bins
            hidden_features=(128, 128),
            passes=5  # Fully autoregressive
        ),
        
        # Rotation for mixing
        UnconditionalTransform(RotationTransform, torch.randn(5, 5)),
        
        # Coupling transform (faster than autoregressive)
        GeneralCouplingTransform(
            features=5,
            context=8,
            univariate=MonotonicRQSTransform,
            shapes=([8], [8], [7]),
            hidden_features=(256, 256),
            activation=torch.nn.ELU
        ).inv,  # Use inverse
    ],
    base=UnconditionalDistribution(
        BoxUniform,
        torch.full([5], -3.0),
        torch.full([5], +3.0),
        buffer=True
    )
)

# Train as usual
optimizer = torch.optim.Adam(flow.parameters(), lr=1e-3)

Tips for Custom Flows

Start Simple: Begin with pre-built flows like NSF or MAF, then customize only when needed.

Alternate Orders: For multiple autoregressive transforms, alternate the ordering for better mixing:

transform=[
    MaskedAutoregressiveTransform(...),  # Forward order
    MaskedAutoregressiveTransform(...),  # Reverse order (automatic)
    MaskedAutoregressiveTransform(...),  # Forward order
]

Splines vs Affine: Use spline transforms (MonotonicRQSTransform) for more expressivity, but affine transforms (MonotonicAffineTransform) for faster training.

Next Steps

Learn about Training Flows
Explore Bayesian Inference applications
Check the API Reference for all available components

Get Started

Core Concepts

Guides

Tutorials

Custom Flows

Understanding Flow Components

Building Custom Flows

Simple Custom Flow

Transformation Types

Masked Autoregressive Transform

Coupling Transform

Spline Transformations

Unconditional Transformations

Fixed Transformations

Trainable Linear Transformations

Custom Base Distributions

Standard Gaussian

Uniform Distribution

Learnable Base Distribution

Advanced Architectures

Multi-Scale Architecture

Residual Flows

Conditional Flows

Context-Dependent Flows

Conditional Base Distribution

Inverse Transforms

Complete Custom Example

Tips for Custom Flows

Next Steps

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Tutorials

​Understanding Flow Components

​Building Custom Flows

​Simple Custom Flow

​Transformation Types

​Masked Autoregressive Transform

​Coupling Transform

​Spline Transformations

​Unconditional Transformations

​Fixed Transformations

​Trainable Linear Transformations

​Custom Base Distributions

​Standard Gaussian

​Uniform Distribution

​Learnable Base Distribution

​Advanced Architectures

​Multi-Scale Architecture

​Residual Flows

​Conditional Flows

​Context-Dependent Flows

​Conditional Base Distribution

​Inverse Transforms

​Complete Custom Example

​Tips for Custom Flows

​Next Steps

Build docs developers (and LLMs) love

Understanding Flow Components

Building Custom Flows

Simple Custom Flow

Transformation Types

Masked Autoregressive Transform

Coupling Transform

Spline Transformations

Unconditional Transformations

Fixed Transformations

Trainable Linear Transformations

Custom Base Distributions

Standard Gaussian

Uniform Distribution

Learnable Base Distribution

Advanced Architectures

Multi-Scale Architecture

Residual Flows

Conditional Flows

Context-Dependent Flows

Conditional Base Distribution

Inverse Transforms

Complete Custom Example

Tips for Custom Flows

Next Steps