Overview
Bernstein Polynomial Flow (BPF) uses Bernstein polynomials to create monotonic, bounded transformations. Unlike other polynomial flows, BPF is explicitly bounded to the interval[-5, 5], making it suitable for data with known ranges.
References
Deep transformation models: Tackling complex regression problems with neural network based transformation models (Sick et al., 2020)https://arxiv.org/abs/2004.00464 Short-Term Density Forecasting of Low-Voltage Load using Bernstein-Polynomial Normalizing Flows (Arpogaus et al., 2022)
https://arxiv.org/abs/2204.13939
Class Definition
Parameters
The number of features in the data.
The number of context features for conditional density estimation.
The degree M of the Bernstein polynomial. Higher degrees allow more complex transformations within the bounded domain.
The number of autoregressive transformations to stack.
Whether features are randomly permuted between transformations.
Additional keyword arguments passed to
MaskedAutoregressiveTransform:hidden_features: Hidden layer sizes (default:[64, 64])activation: Activation function (default:ReLU)
Usage Example
Conditional Flow
Training Example
Methods
forward(c=None)
Returns a normalizing flow distribution.
Arguments:
c(Tensor, optional): Context tensor of shape(*, context)
NormalizingFlow: A distribution with:sample(shape): Sample from the distributionlog_prob(x): Compute log probability of samplesrsample(shape): Reparameterized sampling
When to Use BPF
Good for:
- Bounded data with known range
- Smooth, continuous distributions
- When you want guaranteed monotonicity and boundedness
- Regression and distribution forecasting
- Lower to medium-dimensional problems
- You need unbounded transformations (use MAF)
- You have very high-dimensional data (use NSF)
- Your data extends beyond
[-5, 5]significantly - You need maximum expressivity (use NAF/UNAF)
Tips
-
Standardize to [-5, 5]: Normalize your data to have most values within
[-5, 5]. -
Higher degrees: BPF typically needs higher degrees than SOSPF. Start with
degree=16-32. - Smooth data: BPF works best on smooth, continuous distributions without sharp transitions.
- Forecasting: BPF is particularly good for probabilistic forecasting tasks.
Architecture Details
BPF uses Bernstein polynomials:- Base distribution: Diagonal Gaussian
N(0, I) - Transformation: Bounded Bernstein polynomials
- Domain:
[-5, 5](hard boundary) - Neural network: Masked MLP predicts Bernstein coefficients
- Monotonicity: Ensured by constraining coefficients
w_i are coefficients predicted autoregressively, constrained to ensure monotonicity.
Bernstein Polynomials
Bernstein polynomials have special properties:- Bounded: Always maps
[-5, 5]to[-5, 5] - Smooth: Infinitely differentiable
- Monotonicity: Easy to enforce via coefficient constraints
- Basis: Forms a basis for continuous functions on bounded intervals
- Stability: Numerically stable
b_{i,M} are Bernstein basis polynomials.
Degree Selection
| Degree | Expressivity | Parameters | Use Case |
|---|---|---|---|
| 8-12 | Low | Few | Simple, unimodal |
| 16-24 | Medium | Moderate | General purpose |
| 32-48 | High | Many | Complex distributions |
| 64+ | Very high | Very many | Research, benchmarks |
Comparison with Other Flows
| Property | BPF | SOSPF | NSF | MAF |
|---|---|---|---|---|
| Transformation | Bernstein poly | SOS poly | Spline | Affine |
| Domain | [-5, 5] | [-10, 10] | [-5, 5] | Unbounded |
| Boundedness | Hard boundary | Soft boundary | Soft boundary | None |
| Smoothness | Very high | High | High | Low |
| Typical degree | 16-32 | 4-10 | 8-16 bins | N/A |
| Training speed | Medium | Medium | Medium | Fast |
Advanced Usage
High-Degree Polynomials
Coupling Transformations
Custom Bounds
Mathematical Details
Bernstein Basis
The Bernstein basis polynomials of degree M:C(M, i) is the binomial coefficient and t ∈ [0, 1].
Monotonicity Constraint
To ensure monotonicity:delta_j are predicted by the network.
Inversion
Inversion of Bernstein polynomials is done via:- Analytical solution for low degrees
- Numerical root-finding for higher degrees
Numerical Stability
BPF is numerically stable due to:- Bounded domain: No overflow issues
- Convex hull property: Output is in convex hull of control points
- Stable basis: Bernstein basis is well-conditioned
- No extrapolation: All computation within
[-5, 5]
Applications
Probabilistic Forecasting
Regression with Uncertainty
Bounded Domain Modeling
Comparison with NSF
Both BPF and NSF use piecewise functions:| Property | BPF | NSF |
|---|---|---|
| Basis | Bernstein polynomials | Rational quadratic splines |
| Smoothness | C^∞ | C^1 |
| Locality | Global | Local (per bin) |
| Typical size | degree=16-32 | bins=8-16 |
| Best for | Smooth distributions | General purpose |
Visualization
Related
- SOSPF - Sum-of-squares polynomial alternative
- NSF - Spline-based alternative
- BoundedBernsteinTransform - The underlying transformation
- MAF - Simpler baseline
