Overview
S5 (Simplified State Space) is a clean, easy-to-understand implementation of state space models. It provides a straightforward SSM architecture with support for multiple discretization methods and optional conjugate symmetry. S5 is ideal for learning SSM concepts and serves as a strong baseline for sequence modeling tasks.Paper Reference
Simplified State Space Layers for Sequence Modeling https://openreview.net/forum?id=Ai8Hw3AXqksInstallation
Parameters
Model dimension - size of input and output features.
State dimension (P in the original paper). Internal dimension of the SSM state space.
Discretization method to use:
'zoh': Zero-Order Hold (recommended)'bilinear': Bilinear transform'dirac': Dirac delta approximation'no_discretization': Skip discretization step
If True, uses conjugate symmetry for the state space model. Currently not implemented.
Usage Example
Basic Usage
Different Discretization Methods
Autoregressive Inference
Language Modeling Setup
Key Features
Simple Architecture
S5 has a minimal, easy-to-understand structure:Initialization
S5 uses well-motivated initialization:- A matrix: Complex diagonal with:
- Real part:
log(-0.5)(via inverse softplus) - Imaginary part:
π * [0, 1, 2, ..., N-1](frequency spacing)
- Real part:
- B matrix: Ones scaled by
1/√H - C matrix: Random Gaussian scaled by
√(2/N) - D matrix: Random Gaussian scaled by
√(2/H) - dt: Log-spaced from 0.001 to 0.1
FFT Convolution
Like S4, S5 uses FFT for efficient parallel training:Discretization Methods
Zero-Order Hold (ZOH)
The continuous input is held constant between timesteps:Bilinear Transform
Tustin’s method using trapezoidal integration:Dirac Delta
Simplest discretization:Architecture Details
Forward Pass
- Discretize: Convert continuous SSM to discrete-time
- Compute kernel: Generate convolution kernel from A_bar
- FFT convolution: Apply kernel to input efficiently
- Output projection: Add D skip connection
State Representation
S5 maintains a complex-valued state of dimensiond_state:
Performance Tips
S5 is simpler than S4/S4D but equally effective for many tasks. It’s a great starting point for understanding SSMs.
Comparison with Other Models
| Feature | S5 | S4 | S4D |
|---|---|---|---|
| Complexity | ⭐ Simple | ⭐⭐⭐ Complex | ⭐⭐ Medium |
| Speed | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ |
| Parameters | Fewer | More | Medium |
| Code clarity | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ |
| Performance | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
When to Use S5
✅ Use S5 when:- Learning about SSMs
- You want clean, readable code
- You need a strong baseline
- Simplicity is valued
- You need maximum performance → S4D
- You want input-dependent dynamics → Mamba
- You need minimal parameters → LRU
