The zuko.nn module provides specialized neural network layers and architectures used throughout Zuko, including standard MLPs, masked MLPs for autoregressive models, and monotonic MLPs for monotonic transformations.
MLP
Creates a multi-layer perceptron (MLP), also known as a fully connected feedforward network.
An MLP is a sequence of non-linear parametric functions:
hi+1=ai+1(hiWi+1T+bi+1)
where hi are feature vectors, x=h0 is the input, y=hL is the output, and ai are activation functions.
The number of input features
The number of output features
hidden_features
Sequence[int]
default:"(64, 64)"
The numbers of hidden features for each hidden layer
activation
Callable[[], nn.Module] | None
default:"None"
The activation function constructor. If None, uses torch.nn.ReLU
Whether features are normalized between layers
Keyword arguments passed to Linear layer
Example
import torch
import torch.nn as nn
from zuko.nn import MLP
# Create an MLP with custom architecture
net = MLP(64, 1, [32, 16], activation=nn.ELU)
print(net)
# MLP(
# (0): Linear(in_features=64, out_features=32, bias=True)
# (1): ELU(alpha=1.0)
# (2): Linear(in_features=32, out_features=16, bias=True)
# (3): ELU(alpha=1.0)
# (4): Linear(in_features=16, out_features=1, bias=True)
# )
x = torch.randn(8, 64)
y = net(x) # shape: (8, 1)
Linear
Creates a linear layer with optional stacking support.
Performs the transformation: y=xWT+b
If the stack argument is provided, creates a stack of independent linear operators applied to stacked input vectors.
The number of input features C
The number of output features C′
Whether the layer learns an additive bias b
The number of stacked operators S
Example
import torch
from zuko.nn import Linear
# Standard linear layer
layer = Linear(64, 32)
x = torch.randn(8, 64)
y = layer(x) # shape: (8, 32)
# Stacked linear layers
stacked = Linear(64, 32, stack=5)
x = torch.randn(8, 5, 64)
y = stacked(x) # shape: (8, 5, 32)
MaskedMLP
Creates a masked multi-layer perceptron where the Jacobian structure is controlled by an adjacency matrix.
The resulting MLP is a transformation y=f(x) whose Jacobian entries ∂xj∂yi are null if Aij=0. This is useful for implementing autoregressive models and coupling layers.
The adjacency matrix A∈{0,1}M×N controlling the Jacobian structure
hidden_features
Sequence[int]
default:"(64, 64)"
The numbers of hidden features for each hidden layer
activation
Callable[[], nn.Module] | None
default:"None"
The activation function constructor. If None, uses torch.nn.ReLU
Whether to use residual blocks
The adjacency matrix determines which output features can depend on which input features. An entry Aij=1 means output i can depend on input j, while Aij=0 enforces independence in the Jacobian.
Example
import torch
import torch.nn as nn
from zuko.nn import MaskedMLP
# Create an adjacency matrix
adjacency = torch.randn(4, 3) < 0
print(adjacency)
# tensor([[False, True, True],
# [False, True, True],
# [False, False, True],
# [ True, True, False]])
# Create masked MLP
net = MaskedMLP(adjacency, [16, 32], activation=nn.ELU)
# Forward pass
x = torch.randn(3)
y = net(x) # shape: (4,)
# Verify the Jacobian structure matches adjacency
jac = torch.autograd.functional.jacobian(net, x)
print(jac)
# tensor([[ 0.0000, -0.0065, 0.1158],
# [ 0.0000, -0.0089, 0.0072],
# [ 0.0000, 0.0000, 0.0089],
# [-0.0146, -0.0128, 0.0000]])
# Note: zeros match the False entries in adjacency
Understanding Masking
Masked MLPs are particularly useful for:
- Autoregressive models: Where yi can only depend on x1,…,xi−1
- Coupling layers: Where some outputs depend on a subset of inputs
- Conditional independence: Enforcing specific dependency structures
# Autoregressive structure: y_i depends on x_0, ..., x_{i-1}
autoregressive_adj = torch.tril(torch.ones(4, 4), diagonal=-1).bool()
print(autoregressive_adj)
# tensor([[False, False, False, False],
# [ True, False, False, False],
# [ True, True, False, False],
# [ True, True, True, False]])
autoregressive_net = MaskedMLP(autoregressive_adj, [32, 32])
MonotonicMLP
Creates a monotonic multi-layer perceptron where all Jacobian entries ∂xi∂yj are positive.
This is achieved by using absolute value weights and a special activation function (TwoWayELU) that preserves monotonicity.
The number of input features
The number of output features
hidden_features
Sequence[int]
default:"(64, 64)"
The numbers of hidden features for each hidden layer
Keyword arguments passed to MLP
Monotonic MLPs use MonotonicLinear layers with absolute value weights (y=x∣W∣T+b) and TwoWayELU activations that apply ELU(x) to half the features and −ELU(−x) to the other half.
Example
import torch
from zuko.nn import MonotonicMLP
# Create a monotonic MLP
net = MonotonicMLP(3, 4, [16, 32])
print(net)
# MonotonicMLP(
# (0): MonotonicLinear(in_features=3, out_features=16, bias=True)
# (1): TwoWayELU(alpha=1.0)
# (2): MonotonicLinear(in_features=16, out_features=32, bias=True)
# (3): TwoWayELU(alpha=1.0)
# (4): MonotonicLinear(in_features=32, out_features=4, bias=True)
# )
# Forward pass
x = torch.randn(3)
y = net(x)
# Verify all Jacobian entries are positive
jac = torch.autograd.functional.jacobian(net, x)
print(jac)
# tensor([[1.0492, 1.3094, 1.1711],
# [1.1201, 1.3825, 1.2711],
# [0.9397, 1.1915, 1.0787],
# [1.1049, 1.3635, 1.2592]])
# All entries are positive!
Use Cases
Monotonic MLPs are essential for:
- Monotonic spline transformations: Neural spline flows require monotonic networks
- Quantile functions: Mapping uniform distributions to arbitrary distributions
- Order-preserving transformations: When you need f(x1)<f(x2) whenever x1<x2
from zuko.flows import NSF
# Neural spline flows use monotonic MLPs internally
flow = NSF(features=3, context=2, transforms=5)