The AutoregressiveTransform implements an autoregressive transformation scheme where each output dimension is computed based on previous dimensions.
The autoregressive transformation applies a conditional transformation to each dimension:
y i = f ( x i ∣ x < i ) y_i = f(x_i | x_{<i}) y i = f ( x i ∣ x < i )
where x < i x_{<i} x < i represents all dimensions before i i i . This creates a triangular Jacobian structure, making the log-determinant efficient to compute.
Class Definition
class AutoregressiveTransform ( Transform )
Transform via an autoregressive scheme.
meta
Callable[[Tensor], Transform]
A function which returns a transformation f f f given x x x . This meta-function receives the current state and produces a transformation to apply.
The number of passes for the inverse transformation. Since the inverse requires iterative computation, multiple passes improve accuracy.
Properties
Domain : constraints.real_vector
Codomain : constraints.real_vector
Bijective : True
Implementation Details
Forward Pass
The forward pass evaluates the transformation in a single pass:
def _call ( self , x : Tensor) -> Tensor:
return self .meta(x)(x)
The meta-function receives the input and produces a transformation that is then applied to the same input.
Inverse Pass
The inverse requires iterative refinement:
def _inverse ( self , y : Tensor) -> Tensor:
x = torch.zeros_like(y)
for _ in range ( self .passes):
x = self .meta(x).inv(y)
return x
Multiple passes are needed because the inverse depends on the unknown input x x x .
Usage Examples
import torch
import torch.nn as nn
import zuko
class AutoregressiveNet ( nn . Module ):
"""Neural network for autoregressive transformation."""
def __init__ ( self , features : int ):
super (). __init__ ()
self .net = nn.Sequential(
nn.Linear(features, 64 ),
nn.ReLU(),
nn.Linear( 64 , features * 2 ), # shift and scale parameters
)
def forward ( self , x ):
params = self .net(x)
shift, scale = params.chunk( 2 , dim =- 1 )
return zuko.transforms.MonotonicAffineTransform(shift, scale)
# Create autoregressive transform
net = AutoregressiveNet( features = 5 )
transform = zuko.transforms.AutoregressiveTransform(
meta = net,
passes = 3 # Use 3 passes for inverse
)
# Apply transformation
x = torch.randn( 32 , 5 )
y = transform(x)
# Inverse transformation (requires multiple passes)
x_reconstructed = transform.inv(y)
# Log determinant
ladj = transform.log_abs_det_jacobian(x, y)
print ( f "Log determinant shape: { ladj.shape } " ) # [32]
Autoregressive with Masking
For proper autoregressive structure, use masking in your neural network:
import torch
import torch.nn as nn
import torch.nn.functional as F
import zuko
class MaskedLinear ( nn . Linear ):
"""Linear layer with autoregressive masking."""
def __init__ ( self , in_features , out_features , mask ):
super (). __init__ (in_features, out_features)
self .register_buffer( 'mask' , mask)
def forward ( self , x ):
return F.linear(x, self .weight * self .mask, self .bias)
class AutoregressiveMaskedNet ( nn . Module ):
def __init__ ( self , features : int ):
super (). __init__ ()
# Create autoregressive mask: each output depends only on previous inputs
mask = torch.tril(torch.ones(features, features), diagonal =- 1 )
self .masked = MaskedLinear(features, 64 , mask.expand( 64 , - 1 ))
self .hidden = nn.Linear( 64 , 64 )
self .output = nn.Linear( 64 , features * 2 )
def forward ( self , x ):
h = F.relu( self .masked(x))
h = F.relu( self .hidden(h))
params = self .output(h)
shift, scale = params.chunk( 2 , dim =- 1 )
return zuko.transforms.MonotonicAffineTransform(shift, scale)
# Create masked autoregressive transform
net = AutoregressiveMaskedNet( features = 5 )
transform = zuko.transforms.AutoregressiveTransform( meta = net, passes = 3 )
# Apply to data
x = torch.randn( 32 , 5 )
y, ladj = transform.call_and_ladj(x)
Autoregressive Flow with Multiple Layers
import torch
import torch.nn as nn
import zuko
class AutoregressiveFlow ( nn . Module ):
"""Multi-layer autoregressive flow."""
def __init__ ( self , features : int , layers : int = 3 ):
super (). __init__ ()
self .transforms = []
for _ in range (layers):
# Alternate between autoregressive and permutation
net = AutoregressiveNet(features)
ar_transform = zuko.transforms.AutoregressiveTransform(
meta = net,
passes = 3
)
self .transforms.append(ar_transform)
# Add permutation between layers
perm = torch.randperm(features)
perm_transform = zuko.transforms.PermutationTransform(perm)
self .transforms.append(perm_transform)
# Compose all transforms
self .flow = zuko.transforms.ComposedTransform( * self .transforms)
def forward ( self , x ):
return self .flow(x)
# Create and use flow
flow = AutoregressiveFlow( features = 10 , layers = 3 )
x = torch.randn( 64 , 10 )
y = flow(x)
print ( f "Output shape: { y.shape } " ) # [64, 10]
Key Considerations
Inverse Computation
The inverse transformation requires multiple passes because:
The transformation depends on the input: y i = f ( x i ∣ x < i ) y_i = f(x_i | x_{<i}) y i = f ( x i ∣ x < i )
To compute x x x from y y y , we need x x x itself in the conditioning
We iteratively refine x x x starting from an initial guess (zeros)
Number of Passes
More passes improve inverse accuracy but increase computation:
1 pass : Fast but may be inaccurate
3 passes : Good balance (recommended)
5+ passes : High accuracy but slower
Comparison with Coupling
Autoregressive transforms have advantages and disadvantages compared to coupling transforms:
Advantages:
More expressive (each dimension can depend on all previous ones)
Single forward pass is efficient
Disadvantages:
Inverse requires multiple passes
Sequential computation prevents parallelization
References
Masked Autoregressive Flow (MAF)
Inverse Autoregressive Flow (IAF)
Kingma, D. P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., & Welling, M. (2016). Improved Variational Inference with Inverse Autoregressive Flow.
https://arxiv.org/abs/1606.04934