Overview
Sum-of-Squares Polynomial Flow (SOSPF) uses polynomial transformations constrained to be sums of squares. This ensures monotonicity while providing smooth, expressive transformations.Reference
Sum-of-Squares Polynomial Flow (Jaini et al., 2019)https://arxiv.org/abs/1905.02325
Class Definition
Parameters
The number of features in the data.
The number of context features for conditional density estimation.
The degree L of the polynomials. Higher degrees allow more complex transformations.
The number of polynomials K to sum. More polynomials increase expressivity.
The number of autoregressive transformations to stack.
Whether features are randomly permuted between transformations.
Additional keyword arguments passed to
MaskedAutoregressiveTransform:hidden_features: Hidden layer sizes (default:[64, 64])activation: Activation function (default:ReLU)
Usage Example
Conditional Flow
Training Example
Methods
forward(c=None)
Returns a normalizing flow distribution.
Arguments:
c(Tensor, optional): Context tensor of shape(*, context)
NormalizingFlow: A distribution with:sample(shape): Sample from the distributionlog_prob(x): Compute log probability of samplesrsample(shape): Reparameterized sampling
When to Use SOSPF
Good for:
- Smooth, continuous distributions
- When polynomial structure is appropriate
- Lower-dimensional problems (< 50 features)
- Interpretable transformations
- When you want guaranteed monotonicity
- You have high-dimensional data (use NSF)
- You need very complex transformations (use NAF/UNAF)
- Your data is outside
[-10, 10]and can’t be standardized - You want fast computation (use MAF or RealNVP)
Tips
-
Standardize your data: SOSPF requires features in
[-10, 10]. Normalize inputs before training. -
Tune polynomial degree: Start with
degree=4-6. Higher degrees are more expressive but harder to train. - Multiple polynomials: Use 3-5 polynomials. More polynomials = more expressivity but more parameters.
- Softclip layers: SOSPF automatically includes softclip transformations between layers to maintain bounds.
Architecture Details
SOSPF uses sum-of-squares polynomials:- Base distribution: Diagonal Gaussian
N(0, I) - Transformation: Sum of squared polynomials + shift
- Monotonicity: Guaranteed by SOS structure
- Neural network: Masked MLP predicts polynomial coefficients
- Softclip layers: Inserted between transformations
p_k are polynomials of degree L and coefficients are predicted autoregressively.
Sum-of-Squares Polynomials
Key properties:- Always positive: Sum of squares is always ≥ 0
- Derivative is positive: Ensures monotonicity
- Smooth: Polynomial smoothness
- Flexible: Can approximate many functions
Polynomial Degree Selection
| Degree | Expressivity | Training Difficulty | Use Case |
|---|---|---|---|
| 2-3 | Low | Easy | Simple, unimodal |
| 4-6 | Medium | Medium | General purpose |
| 7-10 | High | Hard | Complex, multimodal |
| 10+ | Very high | Very hard | Research only |
Comparison with Other Flows
| Property | SOSPF | NSF | NAF | MAF |
|---|---|---|---|---|
| Transformation | Polynomial | Spline | Neural | Affine |
| Smoothness | High | High | Medium | Low |
| Expressivity | Medium | High | Very high | Medium |
| Training speed | Medium | Medium | Slow | Fast |
| Interpretability | Medium | Low | Low | High |
| Domain | [-10, 10] | [-5, 5] | [-10, 10] | Unbounded |
Advanced Usage
High-Degree Polynomials
Coupling Transformations
Custom Architecture
Mathematical Details
Monotonicity
The derivative of a SOS polynomial is:Inversion
Inversion requires solving:x. This is done using numerical root-finding methods.
Numerical Stability
Challenges:- High-degree polynomials can overflow
- Numerical issues near boundaries
- Root-finding can be unstable
- Use softclip transformations (automatic)
- Limit polynomial degree (≤ 10)
- Proper initialization
- Gradient clipping during training
Applications
Smooth Density Estimation
Low-Dimensional Data
Interpretable Models
Debugging
Related
- BPF - Bernstein polynomial alternative
- NSF - Spline-based alternative
- SOSPolynomialTransform - The underlying transformation
- MAF - Simpler baseline
