Skip to main content

Why discretization?

Linear RNNs are based on continuous-time state-space models:
h'(t) = A h(t) + B x(t)
y(t) = C h(t) + D x(t)
However, neural networks operate on discrete sequences (text tokens, audio samples, video frames). Discretization is the process of converting these continuous-time equations into discrete-time recurrence relations that can be computed on digital hardware:
# After discretization
h[k+1] = A_bar @ h[k] + B_bar @ x[k]
y[k] = C @ h[k] + D @ x[k]
Where A_bar and B_bar are the discretized matrices obtained from continuous-time A and B.
The discretization method you choose affects the model’s stability, accuracy, and how it handles different temporal patterns.

Available methods

lrnnx implements several discretization methods from the literature. Each has different properties and use cases:

Zero-Order Hold

Most common - assumes input is constant between timesteps

Bilinear

Original S4 method - trapezoidal approximation

Dirac

For event-based data - treats inputs as impulses

Zero-Order Hold (ZOH)

Mathematical formulation

ZOH is the most widely used discretization method in modern linear RNNs. It assumes the input signal is piecewise constant (held) between timesteps: Aˉ=exp(ΔA)\bar{A} = \exp(\Delta A) γˉ=A1(AˉI)\bar{\gamma} = A^{-1} (\bar{A} - I) Where Δ is the discretization step size (learned or fixed), and γ_bar is used to compute B_bar = γ_bar * B.

Implementation

From lrnnx/core/discretization.py:
def zoh(
    A: Tensor, delta: Tensor, integration_timesteps: Optional[Tensor] = None
) -> tuple[Tensor, Tensor]:
    """
    Zero-Order Hold (ZOH) discretization method, used across most models.

    Args:
        A (torch.Tensor): The continuous-time state matrix.
        delta (torch.Tensor): The discretization step size.
        integration_timesteps (torch.Tensor, optional): Not used in ZOH.

    Returns:
        tuple[torch.Tensor, torch.Tensor]: A tuple containing:
            - A_bar : The discretized system matrix.
            - gamma_bar : The input normalizer.
    """
    Identity = torch.ones(A.shape[0], device=A.device)
    A_bar = torch.exp(delta * A)
    gamma_bar = (1 / A) * (A_bar - Identity)
    return A_bar, gamma_bar

Usage

from lrnnx.models.lti import S4

# ZOH is the default for most models
model = S4(
    d_model=64,
    d_state=64,
    l_max=1024,
    discretization="zoh"  # Explicitly specify
)

x = torch.randn(2, 1024, 64)
y = model(x)

When to use ZOH

ZOH is the recommended default for most applications. It’s used in:
  • S4D, S5 (both LTI and LTV variants)
  • LRU
  • Mamba (default)
  • Most modern linear RNN papers
Ideal when your discrete sequence comes from sampling a continuous signal:
  • Audio (sampling continuous sound waves)
  • Time series (sampling continuous measurements)
  • Video (sampling continuous visual scenes)

Bilinear method

Mathematical formulation

The bilinear method (also called Tustin’s method or trapezoidal rule) was the original discretization used in S4: Aˉ=(I+0.5ΔA)1(I0.5ΔA)\bar{A} = (I + 0.5 \Delta A)^{-1} (I - 0.5 \Delta A) γˉ=(I+0.5ΔA)1Δ\bar{\gamma} = (I + 0.5 \Delta A)^{-1} \Delta

Implementation

From lrnnx/core/discretization.py:
def bilinear(
    A: Tensor, delta: Tensor, integration_timesteps: Optional[Tensor] = None
) -> tuple[Tensor, Tensor]:
    """
    Bilinear method first used in S4.

    Args:
        A (torch.Tensor): Continuous-time system matrix (diagonal elements only).
        delta (torch.Tensor): Time step for discretization.

    Returns:
        tuple[torch.Tensor, torch.Tensor]: A tuple containing:
            - A_bar : The discretized system matrix.
            - gamma_bar : The input normalizer.
    """
    Identity = torch.ones(A.shape[0], device=A.device)
    A_bar = (1 / (Identity + 0.5 * delta * A)) * (Identity - 0.5 * delta * A)
    gamma_bar = (1 / (Identity + 0.5 * delta * A)) * delta
    return A_bar, gamma_bar

Usage

from lrnnx.models.lti import S4

# Use bilinear discretization (original S4 method)
model = S4(
    d_model=64,
    d_state=64,
    l_max=1024,
    discretization="bilinear"
)

x = torch.randn(2, 1024, 64)
y = model(x)

When to use bilinear

Use bilinear if you need to exactly reproduce results from the original S4 paper or checkpoints.
Bilinear can provide better stability properties for certain system matrices, though ZOH is generally preferred in modern implementations.
Most modern implementations (including S4D, S5, Mamba) use ZOH by default as it tends to perform better empirically.

Dirac method

Mathematical formulation

The Dirac method treats inputs as instantaneous impulses (Dirac delta functions) rather than sustained signals: Aˉ=exp(ΔA)\bar{A} = \exp(\Delta A) γˉ=1.0\bar{\gamma} = 1.0 Note that γ_bar = 1.0 (constant), unlike ZOH where it depends on A.

Implementation

From lrnnx/core/discretization.py:
def dirac(
    A: Tensor, delta: Tensor, integration_timesteps: Optional[Tensor] = None
) -> tuple[Tensor, float]:
    """
    Dirac discretization method.

    Reference: https://github.com/Efficient-Scalable-Machine-Learning/event-ssm

    Args:
        A (torch.Tensor): Continuous-time system matrix.
        delta (torch.Tensor): Time step for discretization.

    Returns:
        tuple[torch.Tensor, float]: A tuple containing:
            - A_bar : The discretized system matrix.
            - gamma_bar : The input normalizer (1.0).
    """
    A_bar = torch.exp(delta * A)
    gamma_bar = 1.0
    return A_bar, gamma_bar

Usage

from lrnnx.models.lti import S5

# Use Dirac discretization for event-based data
model = S5(
    d_model=64,
    d_state=64,
    discretization="dirac"
)

x = torch.randn(2, 1024, 64)
y = model(x)

When to use Dirac

Dirac discretization is designed for event-based data where inputs represent discrete events rather than continuous signals:
  • Neuromorphic spike trains
  • Event cameras (DVS sensors)
  • Point process data
Use when modeling systems that respond to instantaneous inputs rather than sustained signals.

Asynchronous discretization

Mathematical formulation

Asynchronous discretization allows different timesteps at each sequence position, useful for irregular event streams: Aˉ[t]=exp(Δtimesteps[t]A)\bar{A}[t] = \exp(\Delta \cdot \text{timesteps}[t] \cdot A) γˉ[t]=A1(exp(ΔA)I)\bar{\gamma}[t] = A^{-1} (\exp(\Delta \cdot A) - I)
Asynchronous discretization is only supported for LTV models. LTI models cannot use this method because it creates time-varying dynamics.

Implementation

From lrnnx/core/discretization.py:
def async_(
    A: Tensor, delta: Tensor, integration_timesteps: Optional[Tensor] = None
) -> tuple[Tensor, Tensor]:
    """
    Asynchronous discretization method.

    This method is only applicable for LTV models.

    Args:
        A (torch.Tensor): Continuous-time system matrix.
        delta (torch.Tensor): Time step for discretization.
        integration_timesteps (torch.Tensor): Timesteps for async discretization,
            shape (B, L), representing time differences between events.

    Returns:
        tuple[torch.Tensor, torch.Tensor]: Discretized matrices.
    """
    assert (
        integration_timesteps is not None
    ), "Integration timesteps must be provided for async discretization."
    Identity = torch.ones(A.shape[0], device=A.device)
    A_bar = torch.exp(delta * integration_timesteps * A)
    gamma_bar = (1 / A) * (A_bar - Identity)

    return A_bar, gamma_bar

Usage

from lrnnx.models.ltv import Mamba

# Create model with async discretization
model = Mamba(
    d_model=64,
    d_state=16,
    d_conv=4,
    discretization="async"
)

# Provide timesteps (e.g., time differences between events)
x = torch.randn(2, 1024, 64)
timesteps = torch.abs(torch.randn(2, 1024))  # Variable time deltas

# Pass timesteps to forward
y = model(x, integration_timesteps=timesteps)

When to use async

When your data has non-uniform sampling intervals:
  • Medical records with irregular timestamps
  • Financial tick data
  • Sensor networks with asynchronous reporting
Neuromorphic event cameras and other asynchronous sensors where events occur at irregular intervals.

Model compatibility

Different models support different discretization methods:
Modelzohbilineardiracasync
S4
S4D
S5
LRU
Centaurus
LTI models cannot use async discretization because it creates time-varying dynamics.

Advanced: Custom discretization

You can implement custom discretization methods by following the function signature:
from typing import Optional, Tuple, Union
import torch
from torch import Tensor

def my_custom_discretization(
    A: Tensor,
    delta: Tensor,
    integration_timesteps: Optional[Tensor] = None
) -> Tuple[Tensor, Union[Tensor, float]]:
    """
    Custom discretization method.

    Args:
        A: Continuous-time state matrix
        delta: Discretization step size
        integration_timesteps: Optional timesteps for async processing

    Returns:
        (A_bar, gamma_bar): Discretized matrices
    """
    # Your discretization logic here
    A_bar = ...  # Compute discretized A
    gamma_bar = ...  # Compute input normalizer
    
    return A_bar, gamma_bar
Then register it:
from lrnnx.core.discretization import DISCRETIZE_FNS

DISCRETIZE_FNS["my_custom"] = my_custom_discretization

# Now you can use it
from lrnnx.models.lti import S5
model = S5(d_model=64, d_state=64, discretization="my_custom")

Quick reference

ZOH

Default choiceUse for: Most applications, sampled continuous signalsModels: All LTI and LTV models

Bilinear

Original S4 methodUse for: S4 compatibility, specific stability requirementsModels: All LTI and LTV models

Dirac

Event-basedUse for: Neuromorphic data, event cameras, impulse responsesModels: All LTI and LTV models

Async

Irregular timestepsUse for: Asynchronous events, irregular time seriesModels: LTV models only

Next steps

Linear RNNs

Learn the fundamentals of linear RNNs

LTI vs LTV

Understand when to use each model type

Model Reference

Detailed API documentation for all models

Examples

See complete usage examples

Build docs developers (and LLMs) love