Skip to main content

Overview

RealNVP is a normalizing flow based on coupling transformations. Unlike autoregressive flows, RealNVP can compute both forward and inverse transformations in parallel, making it very fast for both training and sampling.
RealNVP is an alias for NICE with affine transformations. The two classes are equivalent.

Reference

Density estimation using Real NVP (Dinh et al., 2016)
https://arxiv.org/abs/1605.08803

Class Definition

zuko.flows.RealNVP(
    features: int,
    context: int = 0,
    transforms: int = 3,
    randmask: bool = False,
    **kwargs
)

Parameters

features
int
required
The number of features in the data. Must be at least 2 for coupling to work.
context
int
default:"0"
The number of context features for conditional density estimation.
transforms
int
default:"3"
The number of coupling transformations to stack.
randmask
bool
default:"False"
Whether random coupling masks are used. If False, alternating checkered masks are used.
**kwargs
dict
Additional keyword arguments passed to GeneralCouplingTransform:
  • hidden_features: List of hidden layer sizes (default: [64, 64])
  • activation: Activation function (default: ReLU)
  • univariate: Univariate transformation constructor (default: MonotonicAffineTransform)

Usage Example

import torch
import zuko

# Create an unconditional RealNVP
flow = zuko.flows.RealNVP(
    features=10,
    transforms=5,
    hidden_features=[128, 128]
)

# Sample from the flow (fast!)
dist = flow()
samples = dist.sample((1000,))
print(samples.shape)  # torch.Size([1000, 10])

# Compute log probabilities
log_prob = dist.log_prob(samples)
print(log_prob.shape)  # torch.Size([1000])

Conditional Flow

# Create a conditional RealNVP
flow = zuko.flows.RealNVP(
    features=5,
    context=3,
    transforms=4
)

# Sample conditioned on context
context = torch.randn(3)
dist = flow(context)
samples = dist.sample((100,))

Training Example

import torch.optim as optim

flow = zuko.flows.RealNVP(
    features=8,
    transforms=5,
    hidden_features=[256, 256]
)

optimizer = optim.Adam(flow.parameters(), lr=1e-3)

for epoch in range(100):
    for x in dataloader:
        optimizer.zero_grad()
        
        # Maximum likelihood training
        loss = -flow().log_prob(x).mean()
        
        loss.backward()
        optimizer.step()

Random Masks

# Use random masks instead of alternating
flow = zuko.flows.RealNVP(
    features=20,
    transforms=5,
    randmask=True  # Better mixing for structured data
)

Methods

forward(c=None)

Returns a normalizing flow distribution. Arguments:
  • c (Tensor, optional): Context tensor of shape (*, context)
Returns:
  • NormalizingFlow: A distribution with the following methods:
    • sample(shape): Sample from the distribution (fast, parallel)
    • log_prob(x): Compute log probability (fast, parallel)
    • rsample(shape): Reparameterized sampling

When to Use RealNVP

Good for:
  • Real-time generation and sampling
  • Large-scale datasets (fast training)
  • Applications requiring bidirectional speed
  • Image generation (with spatial coupling)
Consider alternatives if:
  • You need maximum expressivity (use NSF or NAF)
  • You have very complex, multimodal distributions
  • Features are highly correlated in complex ways

Tips

  1. More transformations: Since each layer only transforms half the features, use more transformations (5-10) compared to autoregressive flows.
  2. Random masks: Set randmask=True when your features have inherent structure or ordering.
  3. Deeper networks: Coupling layers have less capacity, so use larger hidden layers [256, 256] or [512, 512].
  4. Preprocessing: Combine with activation normalization or batch normalization for better performance.

Architecture Details

RealNVP uses coupling transformations:
  • Base distribution: Diagonal Gaussian N(0, I)
  • Coupling mechanism: Split features into two groups using a binary mask
  • Transformation: First group unchanged, second group transformed based on first
  • Neural network: Standard MLP (no masking needed)
Each coupling layer:
# Binary mask splits features
y_a = x_a  # First half unchanged
y_b = x_b * exp(s(x_a, c)) + t(x_a, c)  # Second half transformed
where s and t are neural networks, and c is optional context.

Coupling vs. Autoregressive

PropertyRealNVP (Coupling)MAF (Autoregressive)
Forward passParallelParallel
Inverse passParallelSequential
Training speedFastFast
Sampling speedFastSlow
ExpressivityMediumMedium
Layers neededMore (5-10)Fewer (3-5)

Advanced Usage

Custom Masks

import torch

# Define custom coupling mask
mask = torch.tensor([0, 1, 0, 1, 0, 1], dtype=torch.bool)

from zuko.flows.coupling import GeneralCouplingTransform

transform = GeneralCouplingTransform(
    features=6,
    mask=mask
)

Multi-Scale Architecture

# For high-dimensional data, use multi-scale architecture
# This is typically done by combining RealNVP with squeezing operations

from zuko.flows import RealNVP
import torch.nn as nn

class MultiScaleRealNVP(nn.Module):
    def __init__(self, features):
        super().__init__()
        self.flow1 = RealNVP(features, transforms=3)
        # Implement squeezing/factoring here
        self.flow2 = RealNVP(features // 2, transforms=3)

Image Modeling

For image data, use spatial coupling patterns:
# Checkerboard coupling for images
# Alternate which pixels are transformed

H, W, C = 28, 28, 1  # MNIST dimensions
features = H * W * C

flow = zuko.flows.RealNVP(
    features=features,
    transforms=8,
    hidden_features=[1024, 1024],
    randmask=False  # Use structured masks for spatial data
)
  • NICE - Predecessor with additive transformations
  • MAF - Autoregressive alternative
  • NSF - More expressive but slower sampling

Build docs developers (and LLMs) love