Skip to main content

Linear RNNs for PyTorch

A unified library providing state-of-the-art Linear RNN architectures including S4, S5, LRU, Mamba, and more — optimized for sequence modeling with custom CUDA kernels.

Quick start

Get up and running with lrnnx in minutes

1

Install the library

Install lrnnx from PyPI. We recommend installing PyTorch first to match your CUDA version.
pip install lrnnx
Installation includes compiling custom CUDA kernels, which may take up to 30 minutes depending on your system.
2

Import and instantiate a model

Choose from Linear Time-Invariant (LTI) or Linear Time-Varying (LTV) models.
from lrnnx.models.lti import LRU
from lrnnx.models.ltv import Mamba
import torch

# LTI model example
model_lti = LRU(d_model=64, d_state=64).cuda()

# LTV model example
model_ltv = Mamba(d_model=64, d_state=16).cuda()
3

Run a forward pass

Process sequences with your model in training or inference mode.
# Create sample input
batch_size, seq_len, d_model = 2, 128, 64
x = torch.randn(batch_size, seq_len, d_model, dtype=torch.float32, device="cuda")

# Forward pass
output = model_lti(x)
print(output.shape)  # torch.Size([2, 128, 64])
For autoregressive inference, use the optimized CUDA graphs-based generation API for 10x speedup. See the inference guide.

Explore by model type

Choose the architecture that fits your use case

Linear Time-Invariant (LTI)

Models with fixed dynamics: S4, S4D, S5, LRU, and Centaurus. Efficient for long sequences with stable patterns.

Linear Time-Varying (LTV)

Models with input-dependent dynamics: Mamba, S6, S7, RG-LRU, STREAM. Adaptive for complex temporal dependencies.

Language models

Pre-built architectures for autoregressive language modeling with replaceable LRNN and attention layers.

U-Net and classifiers

Domain-specific architectures for audio denoising, hierarchical classification, and sequence-to-sequence tasks.

Key features

Everything you need for modern sequence modeling

10+ architectures

Unified implementations of S4, S4D, S5, LRU, Centaurus, Mamba, RG-LRU, S7, and more.

Custom CUDA kernels

Optimized forward and backward kernels for selective scan, simplified scan, and S4 operations.

Multiple API levels

Access scan operations, recurrent steps, or full layer implementations matching the original papers.

Fast inference

CUDA graphs-based autoregressive generation with 10x speedup over naive implementations.

Pre-built architectures

Language models, U-Nets, and hierarchical classifiers ready to use out of the box.

Research-backed

Accepted to EACL 2026 Student Research Workshop with comprehensive benchmarks and evaluation.

Ready to get started?

Install lrnnx and start building with state-of-the-art Linear RNN architectures. Check out the quickstart guide to run your first model in minutes.

View quickstart guide