Overview
Continuous Normalizing Flow (CNF) uses ordinary differential equations (ODEs) to define continuous-time transformations. Instead of stacking discrete transformation layers, CNF learns a continuous dynamics function that transforms the base distribution into the target distribution.References
Neural Ordinary Differential Equations (Chen et al., 2018)https://arxiv.org/abs/1806.07366 FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models (Grathwohl et al., 2018)
https://arxiv.org/abs/1810.01367
Class Definition
Parameters
The number of features in the data.
The number of context features for conditional density estimation.
The number of time embedding frequencies. Higher values provide richer time representations.
The absolute integration tolerance for the ODE solver. Lower values increase accuracy but slow computation.
The relative integration tolerance for the ODE solver.
Whether to calculate the exact log-determinant of the Jacobian (
True) or use an unbiased stochastic estimate (False). Exact is more accurate but slower.Additional keyword arguments passed to the MLP constructor:
hidden_features: Hidden layer sizes (default:[64, 64])activation: Activation function (default:ELU)
Usage Example
Conditional Flow
Fast Training with Stochastic Trace
Training Example
Methods
forward(c=None)
Returns a normalizing flow distribution.
Arguments:
c(Tensor, optional): Context tensor of shape(*, context)
NormalizingFlow: A distribution with:sample(shape): Sample from the distributionlog_prob(x): Compute log probability of samplesrsample(shape): Reparameterized sampling
When to Use CNF
Good for:
- Research and experimentation
- Theoretically unlimited expressivity
- Continuous-time modeling
- When discrete layers are limiting
- Irregular time series data
- You need fast training (use MAF or NSF)
- You need fast sampling (use RealNVP)
- You have limited compute resources
- You want simpler, more interpretable models
Tips
-
Start with stochastic trace: Set
exact=Falsefor faster training, especially with high-dimensional data. -
Tune tolerances: Decrease
atolandrtolfor better accuracy, increase for faster computation. -
More time frequencies: Use
freqs=5or higher for complex temporal dynamics. - Deep networks: CNF benefits from deeper networks (3-4 layers with 256+ hidden units).
- Use ELU activation: Default ELU works well for ODEs.
Architecture Details
CNF models continuous dynamics:- Base distribution: Diagonal Gaussian
N(0, I) - ODE:
dx/dt = f(x, t, c)wherefis a neural network - Time embedding: Sinusoidal embeddings of time
t - Integration: From
t=0tot=1using adaptive ODE solvers - Log determinant: Computed via trace of Jacobian
Continuous vs. Discrete Flows
| Property | CNF (Continuous) | MAF/NSF (Discrete) |
|---|---|---|
| Layers | Continuous dynamics | Discrete transformations |
| Expressivity | Theoretically unlimited | Limited by layers |
| Training speed | Slow | Fast to medium |
| Sampling speed | Slow | Slow (MAF/NSF) |
| Memory | Higher (ODE solver) | Lower |
| Interpretability | Dynamics | Transformations |
ODE Solver Details
CNF uses adaptive ODE solvers:- Forward pass: Integrate from
t=0tot=1 - Inverse pass: Integrate from
t=1tot=0(reverse ODE) - Solver: Adaptive step-size Runge-Kutta methods
- Gradients: Computed via adjoint method (memory efficient)
Tolerances
atol=1e-6, rtol=1e-5: High accuracy (default)atol=1e-5, rtol=1e-4: Balancedatol=1e-4, rtol=1e-3: Fast but less accurate
Trace Estimation
Exact (exact=True):
exact=False):
v ~ N(0, I) is a random vector.
Time Embedding
CNF embeds time using sinusoidal features:Advanced Usage
Custom Network Architecture
High-Dimensional Data
Manual Construction
Computational Considerations
CNF is computationally expensive:- Forward pass: Requires ODE integration
- Backward pass: Uses adjoint method (memory efficient)
- Function evaluations: Adaptive solver makes multiple network calls
- Memory: Stores intermediate states during integration
- Use
exact=Falsefor high-dimensional data - Increase tolerances (
atol=1e-5, rtol=1e-4) - Use smaller networks
- Reduce time embedding frequencies
- Use mixed precision training
Applications
Time Series
CNF naturally handles irregular time series:Continuous Processes
Model continuous physical processes:Research
Explore theoretical limits:Comparison with Other Flows
| Property | CNF | NAF | NSF | RealNVP |
|---|---|---|---|---|
| Type | Continuous | Neural | Spline | Coupling |
| Training | Very slow | Slow | Medium | Fast |
| Sampling | Slow | Slow | Slow | Fast |
| Expressivity | Unlimited | Very high | High | Medium |
| Memory | High | High | Medium | Low |
| Use case | Research | Complex | General | Production |
Related
- FFJTransform - The free-form Jacobian transformation
- Neural ODE - Foundational paper
- FFJORD - CNF for normalizing flows
