Linear RNNs

What are linear RNNs?

Linear RNNs are a class of sequence models that combine the efficiency of recurrent architectures with the expressiveness needed for modern deep learning tasks. Unlike traditional RNNs, linear RNNs use structured state-space models (SSMs) that can be computed efficiently in both recurrent and convolutional forms. At their core, linear RNNs model sequences using a continuous-time state-space representation:

h'(t) = A h(t) + B x(t)
y(t) = C h(t) + D x(t)

Where:

h(t) is the hidden state
x(t) is the input
y(t) is the output
A, B, C, D are learned parameter matrices

These continuous-time equations are then discretized for use with digital sequences (see Discretization).

Why linear RNNs matter

Linear RNNs address fundamental challenges in sequence modeling:

Efficiency advantages

Training speed

For Linear Time-Invariant (LTI) models, training can use FFT-based convolutions instead of sequential recurrence. This enables parallelization across the sequence length, making training as fast as Transformers.

Inference speed

During inference, linear RNNs process sequences recurrently with a fixed-size state. This means constant memory usage and linear-time complexity, regardless of sequence length - a major advantage over Transformers’ quadratic attention.

Long-range dependencies

Through careful parameterization and initialization, linear RNNs can capture dependencies across thousands or even millions of timesteps without the vanishing gradient problems of traditional RNNs.

Comparison with other architectures

Feature	Linear RNNs	Traditional RNNs	Transformers
Training parallelization	Yes (LTI models)	No	Yes
Inference memory	O(1)	O(1)	O(L)
Long-range modeling	Excellent	Poor	Good
Computational cost	O(L)	O(L)	O(L²)

The exact characteristics depend on whether you’re using an LTI or LTV model. See LTI vs LTV for details.

Key models in lrnnx

The library implements several state-of-the-art linear RNN architectures:

S4

Structured State Space (S4) - the foundational model using diagonal plus low-rank (DPLR) parameterization for efficient long-range modeling.

S4D

Diagonal State Space (S4D) - a simplified version of S4 using purely diagonal state matrices for improved efficiency.

S5

Simplified State Space (S5) - a MIMO (multi-input multi-output) extension that processes all channels together.

LRU

Linear Recurrent Unit - uses complex-valued diagonal parameterization for stability and expressiveness.

Centaurus

A family of efficient variants combining different architectural patterns (depthwise separable, neck connections, etc.).

Mamba

Selective State Space Model - input-dependent dynamics (LTV) that adapts its behavior based on the input content.

S6 & S7

Extensions of S5 with selective (input-dependent) mechanisms for greater expressiveness.

RG-LRU

Recurrent Gated LRU - adds gating mechanisms to the LRU architecture for improved selective processing.

Choosing a model

For fixed patterns
For content-dependent processing

Use LTI models (S4, S4D, S5, LRU, Centaurus) when:

Your data has consistent temporal dynamics
You want maximum training speed via FFT convolutions
You need proven stability and long-range modeling
Examples: audio processing, time series forecasting, genomics

Next steps

LTI vs LTV

Learn the key differences between Linear Time-Invariant and Linear Time-Varying models

Discretization

Understand how continuous-time models are converted to discrete-time for implementation

Model Reference

Explore detailed API documentation for each model

Quick Start

Start building with lrnnx models

Get Started

Core Concepts

Models

Architectures

Guides

Tutorials

What are linear RNNs?

Why linear RNNs matter

Efficiency advantages

Comparison with other architectures

Key models in lrnnx

S4

S4D

S5

LRU

Centaurus

Mamba

S6 & S7

RG-LRU

Choosing a model

Next steps

LTI vs LTV

Discretization

Model Reference

Quick Start

Build docs developers (and LLMs) love

Get Started

Core Concepts

Models

Architectures

Guides

Tutorials

​What are linear RNNs?

​Why linear RNNs matter

​Efficiency advantages

​Comparison with other architectures

​Key models in lrnnx

S4

S4D

S5

LRU

Centaurus

Mamba

S6 & S7

RG-LRU

​Choosing a model

​Next steps

LTI vs LTV

Discretization

Model Reference

Quick Start

Build docs developers (and LLMs) love

What are linear RNNs?

Why linear RNNs matter

Efficiency advantages

Comparison with other architectures

Key models in lrnnx

Choosing a model

Next steps