Skip to main content

Overview

Linear RNNs in lrnnx come in two fundamental flavors that differ in how their dynamics evolve over time:

LTI: Linear Time-Invariant

Fixed dynamics - the state transition matrices (A, B, C) are constant across all timesteps

LTV: Linear Time-Varying

Input-dependent dynamics - the state transition matrices change based on the input at each timestep

Linear Time-Invariant (LTI) models

How they work

LTI models use fixed state-space parameters that don’t change during sequence processing:
# LTI: Parameters are constant
h[t] = A_bar @ h[t-1] + B_bar @ x[t]
y[t] = C @ h[t]

# A_bar, B_bar, C are the same for all timesteps
Because the dynamics are fixed, LTI models can be computed in two equivalent ways:
from lrnnx.models.lti import S4

model = S4(d_model=64, d_state=64, l_max=1024)

# During inference: process step-by-step
cache = model.allocate_inference_cache(batch_size=1)
for t in range(seq_len):
    output, cache = model.step(x[:, t:t+1], cache)
Advantages: Constant memory, linear time complexity

LTI models in lrnnx

The library provides several LTI architectures:
ModelDescriptionKey Features
S4Structured State SpaceDPLR parameterization, excellent long-range modeling
S4DDiagonal S4Simplified diagonal-only parameterization
S5Simplified S5MIMO design, processes all channels together
LRULinear Recurrent UnitComplex-valued diagonal states
CentaurusEfficient variantsMultiple architectural patterns (DWS, Neck, etc.)
All LTI models in lrnnx extend the LTI_LRNN base class, which provides the compute_kernel() method for FFT-based training.

When to use LTI models

When your data has consistent dynamics that don’t need to adapt based on content. Examples include:
  • Audio waveforms with fixed sampling rates
  • Regular time series (weather, sensor data)
  • Genomic sequences
LTI models can leverage FFT-based convolutions during training, making them extremely fast to train - comparable to or faster than Transformers on long sequences.
LTI models like S4 have well-understood stability guarantees and initialization schemes that ensure reliable long-range modeling out of the box.

Linear Time-Varying (LTV) models

How they work

LTV models compute input-dependent parameters at each timestep:
# LTV: Parameters vary based on input
A_bar[t], B_bar[t] = f(x[t])  # Computed from input!
h[t] = A_bar[t] @ h[t-1] + B_bar[t] @ x[t]
y[t] = C @ h[t]

# A_bar[t] and B_bar[t] are different at each timestep
This input-dependence is often called selectivity - the model can selectively filter or emphasize different information based on the input content.

Example: Mamba’s selective mechanism

from lrnnx.models.ltv import Mamba

model = Mamba(d_model=64, d_state=16, d_conv=4)

# The model adapts its dynamics based on input
x = torch.randn(2, 1024, 64)
y = model(x)  # Internally computes input-dependent A, B, C
Inside Mamba, the parameters are computed as:
# Simplified view of Mamba's selectivity
delta = softplus(linear_delta(x))  # Input-dependent timestep
B = linear_B(x)                     # Input-dependent B
C = linear_C(x)                     # Input-dependent C

# A is still fixed, but scaled by input-dependent delta
A_bar = exp(delta * A)
B_bar = delta * B
The exact mechanism varies by model. S6/S7 make different matrices input-dependent compared to Mamba.

LTV models in lrnnx

ModelDescriptionSelective Mechanism
MambaSelective State SpaceInput-dependent Δ, B, C (S6 variant)
S6Selective S5Input-dependent B, C (original)
S7Bidirectional S6S6 with bidirectional processing
RG-LRURecurrent Gated LRUGated variant of LRU
Event-based variantsAsync processingSupport variable timesteps for event data
All LTV models extend the LTV_LRNN base class and support the integration_timesteps parameter for event-based processing.

When to use LTV models

When the model needs to adapt its behavior based on input content:
  • Language modeling (focus on important tokens)
  • Document understanding (selective information flow)
  • Tasks requiring filtering or gating mechanisms
LTV models excel at tasks that require selectively storing and retrieving information, such as:
  • Selective copying benchmarks
  • In-context learning
  • Association recall tasks
LTV models support asynchronous discretization for irregular time series:
  • Neuromorphic event streams
  • Medical records with irregular timestamps
  • Financial tick data

Key differences

AspectLTILTV
ParallelizationFull (FFT convolution)Sequential (scan/recurrence)
Training speedVery fastModerate (optimized with kernels)
GPU utilizationExcellentGood (with custom kernels)

Code comparison

from lrnnx.models.lti import S4

# Create LTI model with ZOH discretization
model = S4(
    d_model=64,
    d_state=64,
    l_max=1024,
    discretization="zoh"
)

# Training: uses FFT convolution internally
x = torch.randn(2, 1024, 64)
y = model(x)

# Inference: step-by-step recurrence
cache = model.allocate_inference_cache(batch_size=1)
for t in range(seq_len):
    output, cache = model.step(x[:, t], cache)

Choosing between LTI and LTV

Use this decision tree to guide your choice:

Next steps

Discretization

Learn how discretization methods work and which to use

Model Reference

Detailed API documentation for all models

Linear RNNs

Learn the fundamentals of linear RNNs

Examples

See complete examples using LTI and LTV models

Build docs developers (and LLMs) love