S4D

Overview

S4D (Structured State Space - Diagonal) is a simplified variant of S4 that uses a diagonal parameterization instead of the DPLR (Diagonal Plus Low-Rank) structure. This makes it faster and easier to implement while maintaining competitive performance. The key difference from S4 is that S4D uses a pure diagonal matrix for A, eliminating the low-rank correction term P.

Paper Reference

On the Parameterization and Initialization of Diagonal State Space Models Original implementation: https://github.com/state-spaces/s4

Installation

from lrnnx.models.lti import S4D

Parameters

d_model

int

required

Model dimension - size of input and output features.

d_state

int

default:"64"

State dimension (N). Internal dimension of the SSM state space.

l_max

int

default:"None"

Maximum sequence length for the kernel. Required for FFT convolution.

channels

int

default:"1"

Number of channels/heads for multi-headed processing.

bottleneck

int

default:"None"

Reduce dimension of inner layer. If specified, adds input projection.

gate

int

default:"None"

Add multiplicative gating for enhanced expressiveness.

final_act

str

default:"'glu'"

Activation after final linear layer: 'glu', 'id', or None.

dropout

float

default:"0.0"

Dropout probability.

tie_dropout

bool

default:"False"

Tie dropout mask across sequence length.

transposed

bool

default:"True"

Input format: (B, H, L) if True, (B, L, H) if False.

SSM Configuration

dt_min

float

default:"0.001"

Minimum timestep value.

dt_max

float

default:"0.1"

Maximum timestep value.

dt_tie

bool

default:"True"

Tie dt across all channels.

dt_transform

str

default:"'exp'"

Transform applied to dt parameter.

rank

int

default:"1"

Rank parameter (kept for compatibility, unused in diagonal mode).

n_ssm

int

default:"None"

Number of independent SSMs. Defaults to d_model.

init

str

default:"'legs'"

Initialization method for A matrix: 'legs', 'hippo', etc.

deterministic

bool

default:"False"

Use deterministic initialization.

real_transform

str

default:"'exp'"

Transform for real part of A.

imag_transform

str

default:"'none'"

Transform for imaginary part of A.

is_real

bool

default:"False"

Use real-valued (instead of complex) SSM.

disc

str

default:"'zoh'"

S4D-specific: Discretization method. Options: 'zoh', 'bilinear'.

float

default:"None"

Learning rate for SSM parameters.

float

default:"0.0"

Weight decay for SSM parameters.

verbose

bool

default:"True"

Print initialization information.

Usage Example

Basic Usage

import torch
from lrnnx.models.lti import S4D

# Create S4D model
model = S4D(d_model=64, d_state=64, l_max=1024)

# Forward pass
x = torch.randn(2, 1024, 64)  # (batch, length, features)
y, state = model(x)

print(y.shape)  # torch.Size([2, 1024, 64])

With Custom Discretization

model = S4D(
    d_model=128,
    d_state=64,
    l_max=2048,
    disc="bilinear",  # Use bilinear discretization
    dt_min=0.0001,
    dt_max=0.1,
)

x = torch.randn(4, 2048, 128)
y, state = model(x)

Autoregressive Inference

import torch
from lrnnx.models.lti import S4D

model = S4D(d_model=64, d_state=64, l_max=1024)
batch_size = 2

# Allocate inference cache
cache = model.allocate_inference_cache(batch_size=batch_size)

# Generate sequence autoregressively
for t in range(100):
    x_t = torch.randn(batch_size, 64)
    y_t, cache = model.step(x_t, cache)
    # y_t.shape: (batch_size, 64)

Multi-Channel Configuration

model = S4D(
    d_model=256,
    d_state=64,
    l_max=4096,
    channels=8,        # 8-headed SSM
    dropout=0.1,
    disc="zoh",
)

x = torch.randn(2, 4096, 256)
y, state = model(x)

Key Differences from S4

Diagonal Parameterization

S4D uses a pure diagonal matrix:

A = Λ  (diagonal only, no low-rank term)

Versus S4’s DPLR:

A = Λ - PP*  (diagonal plus low-rank)

This simplification:

✅ Faster computation
✅ Simpler implementation
✅ Easier to tune
⚠️ Slightly less expressive (but usually negligible)

Discretization Options

S4D explicitly supports multiple discretization methods via the disc parameter:

ZOH (Zero-Order Hold): Default, good general-purpose discretization
Bilinear: Better frequency response preservation

Architecture Details

Forward Pass

Same structure as S4:

Optional bottleneck/gating
FFT-based SSM convolution (training)
D skip connection
GELU activation
Output projection

Diagonal SSM

The diagonal structure enables efficient computation:

# Element-wise operations instead of matrix multiplication
h_t = A * h_{t-1} + B @ x_t  # A is diagonal
y_t = C @ h_t + D @ x_t

Initialization

A: Diagonal complex matrix with HiPPO/LEGS initialization
B, C: Random Gaussian scaled by dimensions
dt: Log-spaced between dt_min and dt_max
D: Random initialization

Performance Comparison

Model	Speed	Memory	Performance
S4	⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐
S4D	⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐

S4D is typically 10-20% faster than S4 with similar performance.

Performance Tips

S4D is a great default choice - simpler than S4 with similar performance and faster training.

The disc parameter allows you to choose discretization. ZOH is recommended for most tasks, but bilinear can be better for frequency-domain applications.

Like all LTI models, S4D does not support variable timesteps. For event-driven data, use Mamba or S7.

When to Use S4D

✅ Use S4D when:

You want a simple, efficient SSM
Training speed is important
You don’t need the extra expressiveness of DPLR
You’re working with regular, fixed-interval sequences

❌ Consider alternatives when:

You need input-dependent dynamics → Use Mamba
You need ultra-minimal parameters → Use LRU
You want to experiment with DPLR → Use S4

Get Started

Core Concepts

Models

Architectures

Guides

Tutorials

Overview

Paper Reference

Installation

Parameters

SSM Configuration

Usage Example

Basic Usage

With Custom Discretization

Autoregressive Inference

Multi-Channel Configuration

Key Differences from S4

Diagonal Parameterization

Discretization Options

Architecture Details

Forward Pass

Diagonal SSM

Initialization

Performance Comparison

Performance Tips

When to Use S4D

See Also

Build docs developers (and LLMs) love

Get Started

Core Concepts

Models

Architectures

Guides

Tutorials

​Overview

​Paper Reference

​Installation

​Parameters

​SSM Configuration

​Usage Example

​Basic Usage

​With Custom Discretization

​Autoregressive Inference

​Multi-Channel Configuration

​Key Differences from S4

​Diagonal Parameterization

​Discretization Options

​Architecture Details

​Forward Pass

​Diagonal SSM

​Initialization

​Performance Comparison

​Performance Tips

​When to Use S4D

​See Also

Build docs developers (and LLMs) love

Overview

Paper Reference

Installation

Parameters

SSM Configuration

Usage Example

Basic Usage

With Custom Discretization

Autoregressive Inference

Multi-Channel Configuration

Key Differences from S4

Diagonal Parameterization

Discretization Options

Architecture Details

Forward Pass

Diagonal SSM

Initialization

Performance Comparison

Performance Tips

When to Use S4D

See Also