Overview
Gaussian Mixture Model (GMM) represents a distribution as a weighted sum of Gaussian components. While technically not a normalizing flow, GMM is included in Zuko as a simple and interpretable density estimation method.GMM is located in
zuko.mixtures but is also available as zuko.flows.GMM for backwards compatibility.Reference
Wikipedia: Gaussian Mixture Modelhttps://wikipedia.org/wiki/Mixture_model#Gaussian_mixture_model
Class Definition
Parameters
The number of features in the data.
The number of context features for conditional density estimation.
The number of Gaussian components K in the mixture.
The type of covariance matrix parameterization:
"full": Full covariance matrix (most expressive)"diagonal": Diagonal covariance (axis-aligned)"spherical": Single variance parameter (isotropic)
Whether to tie the covariance parameters across components. If
True, all components share the same covariance structure.A numerical stability term added to variances.
Additional keyword arguments passed to the MLP constructor (for conditional GMMs):
hidden_features: Hidden layer sizes (default:[64, 64])activation: Activation function
Usage Example
Conditional GMM
Training Example
Initialization with K-Means
Methods
forward(c=None)
Returns a Gaussian mixture distribution.
Arguments:
c(Tensor, optional): Context tensor of shape(*, context)
Mixture: A mixture distribution with:sample(shape): Sample from the mixturelog_prob(x): Compute log probability of samplescomponent_distribution: The underlying Gaussian components
initialize(x, strategy)
Initializes the GMM components using clustering.
Arguments:
x(Tensor): Feature samples with shape(N, features)strategy(str): Clustering strategy:"random": Random initialization"kmeans": K-means clustering"kmeans++": K-means++ initialization
When to Use GMM
Good for:
- Simple, interpretable density estimation
- Clustering applications
- Low-dimensional data (< 20 features)
- When you know the number of modes
- Baseline comparisons
- Fast inference
- You need high expressivity (use NSF or NAF)
- You have high-dimensional data (use flows)
- You don’t know the number of components
- Data has complex, non-Gaussian structure
Tips
- Initialize properly: Use k-means initialization for better convergence.
- Choose components: Start with 3-10 components. Use cross-validation to select.
- Covariance type: Use “full” for small data, “diagonal” for medium, “spherical” for large/high-dim.
-
Regularization: The
epsilonparameter prevents singular covariances.
Model Equation
GMM represents the distribution as:Kis the number of componentsw_iare mixing weights (sum to 1)N(μ_i, Σ_i)are Gaussian componentscis optional context
Covariance Types
Full Covariance
- Most expressive
O(features^2)parameters per component- Can model correlations
- Best for low-dimensional data
Diagonal Covariance
- Medium expressivity
O(features)parameters per component- Assumes features are independent
- Good for medium-dimensional data
Spherical Covariance
- Least expressive
O(1)parameters per component- Isotropic Gaussians (same variance in all directions)
- Good for high-dimensional data
Tied vs. Untied Covariances
Untied (default)
Tied
Initialization Strategies
Random
K-Means
K-Means++
Advanced Usage
Model Selection
Extract Cluster Assignments
Conditional GMM for Regression
Visualization
Comparison with Flows
| Property | GMM | NSF/MAF (Flows) |
|---|---|---|
| Expressivity | Low-Medium | High |
| Interpretability | High | Low |
| Training speed | Fast | Medium-Slow |
| Inference speed | Fast | Medium |
| Scalability | Low (< 20D) | High (100s D) |
| Clustering | Yes | No |
| Flexibility | Limited | High |
Applications
Clustering
Anomaly Detection
Generative Modeling
Limitations
- Fixed components: Must specify number of components in advance
- Gaussian assumption: Each component is Gaussian
- Low capacity: Limited expressivity compared to flows
- Scalability: Not suitable for very high-dimensional data
- Local optima: EM-based training can get stuck
Related
- Normalizing Flows - More expressive alternatives
- GF - Flow with mixture-based transformations
- Mixture Distribution - The underlying distribution class
