What are linear RNNs?
Linear RNNs are a class of sequence models that combine the efficiency of recurrent architectures with the expressiveness needed for modern deep learning tasks. Unlike traditional RNNs, linear RNNs use structured state-space models (SSMs) that can be computed efficiently in both recurrent and convolutional forms. At their core, linear RNNs model sequences using a continuous-time state-space representation:h(t)is the hidden statex(t)is the inputy(t)is the outputA,B,C,Dare learned parameter matrices
Why linear RNNs matter
Linear RNNs address fundamental challenges in sequence modeling:Efficiency advantages
Training speed
Training speed
For Linear Time-Invariant (LTI) models, training can use FFT-based convolutions instead of sequential recurrence. This enables parallelization across the sequence length, making training as fast as Transformers.
Inference speed
Inference speed
During inference, linear RNNs process sequences recurrently with a fixed-size state. This means constant memory usage and linear-time complexity, regardless of sequence length - a major advantage over Transformers’ quadratic attention.
Long-range dependencies
Long-range dependencies
Through careful parameterization and initialization, linear RNNs can capture dependencies across thousands or even millions of timesteps without the vanishing gradient problems of traditional RNNs.
Comparison with other architectures
| Feature | Linear RNNs | Traditional RNNs | Transformers |
|---|---|---|---|
| Training parallelization | Yes (LTI models) | No | Yes |
| Inference memory | O(1) | O(1) | O(L) |
| Long-range modeling | Excellent | Poor | Good |
| Computational cost | O(L) | O(L) | O(L²) |
The exact characteristics depend on whether you’re using an LTI or LTV model. See LTI vs LTV for details.
Key models in lrnnx
The library implements several state-of-the-art linear RNN architectures:S4
Structured State Space (S4) - the foundational model using diagonal plus low-rank (DPLR) parameterization for efficient long-range modeling.
S4D
Diagonal State Space (S4D) - a simplified version of S4 using purely diagonal state matrices for improved efficiency.
S5
Simplified State Space (S5) - a MIMO (multi-input multi-output) extension that processes all channels together.
LRU
Linear Recurrent Unit - uses complex-valued diagonal parameterization for stability and expressiveness.
Centaurus
A family of efficient variants combining different architectural patterns (depthwise separable, neck connections, etc.).
Mamba
Selective State Space Model - input-dependent dynamics (LTV) that adapts its behavior based on the input content.
S6 & S7
Extensions of S5 with selective (input-dependent) mechanisms for greater expressiveness.
RG-LRU
Recurrent Gated LRU - adds gating mechanisms to the LRU architecture for improved selective processing.
Choosing a model
- For fixed patterns
- For content-dependent processing
Use LTI models (S4, S4D, S5, LRU, Centaurus) when:
- Your data has consistent temporal dynamics
- You want maximum training speed via FFT convolutions
- You need proven stability and long-range modeling
- Examples: audio processing, time series forecasting, genomics
Next steps
LTI vs LTV
Learn the key differences between Linear Time-Invariant and Linear Time-Varying models
Discretization
Understand how continuous-time models are converted to discrete-time for implementation
Model Reference
Explore detailed API documentation for each model
Quick Start
Start building with lrnnx models
