Overview
TheHierarchicalClassifier (also called Classifier) is a flexible architecture for sequence classification and regression built with Linear Recurrent Neural Networks. It supports variable-length sequences, hierarchical pooling across layers, and mixing different LRNN types.
This architecture is inspired by the Event-SSM paper and is designed for tasks like:
- Text classification
- Sentiment analysis
- Time series classification
- Event-based vision tasks
- Regression on sequential data
Architecture
The model consists of:- Input projection/embedding: Maps raw features or tokens to model dimension
- Stack of LRNN blocks: Each block contains:
- LRNN layer (LRU, S5, Centaurus, etc.)
- Residual connection
- Layer normalization
- Dropout
- Optional intermediate pooling to reduce sequence length
- Final pooling: Aggregates sequence to single vector (last/mean/max)
- Output head: Linear projection to class logits or regression values
Class Signature
Parameters
Number of input features (ignored when
vocab_size is provided for token embeddings)Number of output classes for classification. Set to 0 for regression tasks.
Number of regression outputs (only used when
num_classes=0)Hidden dimension of the model
State dimension for the LRNN layers
Number of LRNN layers
LRNN class or list of classes (one per layer) to use. Can be:
- A single class:
LRU,S5,Centaurus - A string:
"LRU","S5","Centaurus" - A list of classes/strings for heterogeneous layers:
["LRU", "S5", LRU, Centaurus]
Final pooling strategy for aggregating sequence to single vector:
"last": Use last timestep (respects variable lengths)"mean": Average pooling over sequence"max": Max pooling over sequence
Dropout probability for regularization
Pooling strategy for reducing sequence length within layers:
"none": No intermediate pooling"stride": Strided selection (every k-th element)"mean": Average pooling"max": Max pooling
Factor by which to reduce sequence length at each layer with intermediate pooling. Can be a single int or list (one per layer).
Size of vocabulary for token embeddings. When provided, the model expects token IDs as input instead of continuous features.
Dimension of token embeddings (defaults to
d_model if not specified)Maximum sequence length for positional embeddings
Index of padding token for embedding layer
Additional parameters passed to LRNN constructors. Example:
{"d_model": 128, "d_state": 64}Usage Examples
Text Classification
Time Series Classification with Variable Lengths
Hierarchical Pooling for Long Sequences
Regression Task
Heterogeneous LRNN Layers
Methods
forward
x(torch.Tensor): Input tensor- Token IDs of shape
(B, L)when using embeddings - Continuous features of shape
(B, L, input_dim)otherwise
- Token IDs of shape
lengths(torch.Tensor, optional): Actual sequence lengths of shape(B,)for variable-length sequencesintegration_timesteps(torch.Tensor, optional): Timesteps of shape(B, L)for LTV models
torch.Tensor:- Classification logits of shape
(B, num_classes)whennum_classes > 0 - Regression values of shape
(B, output_dim)whennum_classes = 0
- Classification logits of shape
Use Cases
- Text classification: Sentiment analysis, topic classification, spam detection
- Event-based vision: Event camera classification tasks
- Time series classification: Activity recognition, anomaly detection
- Sequence regression: Predicting continuous values from sequential data
- Audio classification: Speaker recognition, audio event detection
References
- Implementation reference: Event-SSM
- LRU: Linear Recurrent Unit
- S5: Simplified State Space Model
See Also
- Event Classification Tutorial - Complete guide to using the classifier for event-based tasks
- LRU - Linear Recurrent Unit layer
- S5 - Simplified S4 layer
- Centaurus - Centaurus layer
