Skip to main content

What are linear RNNs?

Linear RNNs are a class of sequence models that combine the efficiency of recurrent architectures with the expressiveness needed for modern deep learning tasks. Unlike traditional RNNs, linear RNNs use structured state-space models (SSMs) that can be computed efficiently in both recurrent and convolutional forms. At their core, linear RNNs model sequences using a continuous-time state-space representation:
h'(t) = A h(t) + B x(t)
y(t) = C h(t) + D x(t)
Where:
  • h(t) is the hidden state
  • x(t) is the input
  • y(t) is the output
  • A, B, C, D are learned parameter matrices
These continuous-time equations are then discretized for use with digital sequences (see Discretization).

Why linear RNNs matter

Linear RNNs address fundamental challenges in sequence modeling:

Efficiency advantages

For Linear Time-Invariant (LTI) models, training can use FFT-based convolutions instead of sequential recurrence. This enables parallelization across the sequence length, making training as fast as Transformers.
During inference, linear RNNs process sequences recurrently with a fixed-size state. This means constant memory usage and linear-time complexity, regardless of sequence length - a major advantage over Transformers’ quadratic attention.
Through careful parameterization and initialization, linear RNNs can capture dependencies across thousands or even millions of timesteps without the vanishing gradient problems of traditional RNNs.

Comparison with other architectures

FeatureLinear RNNsTraditional RNNsTransformers
Training parallelizationYes (LTI models)NoYes
Inference memoryO(1)O(1)O(L)
Long-range modelingExcellentPoorGood
Computational costO(L)O(L)O(L²)
The exact characteristics depend on whether you’re using an LTI or LTV model. See LTI vs LTV for details.

Key models in lrnnx

The library implements several state-of-the-art linear RNN architectures:

S4

Structured State Space (S4) - the foundational model using diagonal plus low-rank (DPLR) parameterization for efficient long-range modeling.

S4D

Diagonal State Space (S4D) - a simplified version of S4 using purely diagonal state matrices for improved efficiency.

S5

Simplified State Space (S5) - a MIMO (multi-input multi-output) extension that processes all channels together.

LRU

Linear Recurrent Unit - uses complex-valued diagonal parameterization for stability and expressiveness.

Centaurus

A family of efficient variants combining different architectural patterns (depthwise separable, neck connections, etc.).

Mamba

Selective State Space Model - input-dependent dynamics (LTV) that adapts its behavior based on the input content.

S6 & S7

Extensions of S5 with selective (input-dependent) mechanisms for greater expressiveness.

RG-LRU

Recurrent Gated LRU - adds gating mechanisms to the LRU architecture for improved selective processing.

Choosing a model

Use LTI models (S4, S4D, S5, LRU, Centaurus) when:
  • Your data has consistent temporal dynamics
  • You want maximum training speed via FFT convolutions
  • You need proven stability and long-range modeling
  • Examples: audio processing, time series forecasting, genomics

Next steps

LTI vs LTV

Learn the key differences between Linear Time-Invariant and Linear Time-Varying models

Discretization

Understand how continuous-time models are converted to discrete-time for implementation

Model Reference

Explore detailed API documentation for each model

Quick Start

Start building with lrnnx models

Build docs developers (and LLMs) love