Overview
Gaussianization Flow (GF) uses element-wise transformations combined with rotations to transform data into a Gaussian distribution. Unlike autoregressive flows, GF transforms all features simultaneously using element-wise operations.Reference
Gaussianization Flows (Meng et al., 2020)https://arxiv.org/abs/2003.01941
Class Definition
Parameters
The number of features in the data.
The number of context features for conditional density estimation.
The number of Gaussianization transformations to stack.
The number of mixture components in each Gaussianization transformation. More components increase expressivity.
Additional keyword arguments passed to
ElementWiseTransform:hidden_features: Hidden layer sizes (default:[64, 64])activation: Activation function (default:ReLU)
Usage Example
Conditional Flow
Training Example
Methods
forward(c=None)
Returns a normalizing flow distribution.
Arguments:
c(Tensor, optional): Context tensor of shape(*, context)
NormalizingFlow: A distribution with:sample(shape): Sample from the distributionlog_prob(x): Compute log probability of samplesrsample(shape): Reparameterized sampling
When to Use GF
Good for:
- Tabular data
- When features have different marginal distributions
- Medium-dimensional problems (10-100 features)
- When you want rotation-invariant transformations
- Fast parallel transformations
- You need maximum expressivity (use NSF or NAF)
- You have very high-dimensional data (> 100 features)
- Your data is outside
[-10, 10]and can’t be standardized - You need to model complex feature dependencies (use MAF/NSF)
Tips
-
Standardize your data: GF requires features in
[-10, 10]. Always normalize inputs. - More components: Use 12-16 components for complex marginal distributions.
- More transformations: Use 5-10 transformations since each only does element-wise operations.
- Rotation matrices: GF alternates element-wise transforms with random rotations for better mixing.
Architecture Details
GF alternates between element-wise and rotation transformations:- Base distribution: Diagonal Gaussian
N(0, I) - Element-wise layer: Independent Gaussianization per feature
- Rotation layer: Random orthogonal matrix mixing features
- Neural network: MLP predicts mixture parameters per feature
Gaussianization Transform
Each element-wise transformation:componentsGaussians per feature- Locations and scales predicted by neural network
- Conditional on context (if provided)
Rotation Transformations
Rotations mix features between Gaussianization layers:R is a random orthogonal matrix initialized at creation.
Rotations:
- Enable features to interact
- Are fixed (not learned) in Zuko’s implementation
- Preserve distances (orthogonal)
- Have unit Jacobian determinant
Element-Wise vs. Autoregressive
| Property | GF (Element-wise) | MAF (Autoregressive) |
|---|---|---|
| Transformation | Parallel | Sequential |
| Speed | Fast | Slow (inverse) |
| Dependencies | Via rotations | Direct autoregressive |
| Expressivity | Medium | Medium-High |
| Feature mixing | Rotations | Masking |
Comparison with Other Flows
| Property | GF | MAF | NSF | RealNVP |
|---|---|---|---|---|
| Type | Element-wise + Rotation | Autoregressive | Autoregressive | Coupling |
| Forward | Fast | Fast | Fast | Fast |
| Inverse | Fast | Slow | Slow | Fast |
| Expressivity | Medium | Medium | High | Medium |
| Best for | Tabular | General | General | Images |
Advanced Usage
Custom Number of Components
High-Dimensional Data
Manual Construction
Computational Considerations
GF is computationally efficient:- Forward pass: All features transformed in parallel
- Inverse pass: Also parallel (unlike autoregressive)
- Memory: Moderate (stores mixture parameters)
- Speed: Faster than autoregressive flows
Applications
Tabular Data Modeling
Anomaly Detection
Data Preprocessing
Interpretability
GF provides some interpretability:Limitations
- Fixed rotations: Rotation matrices are random, not learned
- Limited dependencies: Feature dependencies only via rotations
- Bounded domain: Requires data in
[-10, 10] - Medium expressivity: Less expressive than NSF or NAF
Tips for Best Results
- Feature engineering: GF works well when individual features have interesting distributions
- Standardization: Ensure each feature has similar scale
- Sufficient transformations: Use 5-10 layers for good mixing
- Component selection: Start with 8-12 components, increase if needed
- Learning rate: Use smaller learning rates (1e-4) for stability
Debugging
Related
- MAF - Autoregressive alternative
- NSF - More expressive autoregressive flow
- GaussianizationTransform - The element-wise transformation
- RotationTransform - The rotation transformation
