Overview
Classifier is a flexible architecture for sequence classification and regression tasks. It supports multiple LRNN layer types, variable-length sequences, token embeddings, and hierarchical pooling strategies.
Class Definition
Parameters
Number of input features. Ignored when
vocab_size is provided (for token-based inputs).Number of output classes for classification. Set to 0 for regression tasks.
Number of regression outputs (used when
num_classes=0).Hidden dimension of the model.
State dimension for the LRNN layers.
Number of LRNN processing blocks.
LRNN class or list of classes (one per layer). Can be a class object or string name.Available options:
LRUor"LRU"- Linear Recurrent UnitS5or"S5"- Simplified State SpaceCentaurusor"Centaurus"- Centaurus mixer
[LRU, S5, LRU, S5] or ["LRU", "S5", "LRU", "S5"]Final pooling strategy for sequence outputs.Options:
"last"- Use the last timestep (respectslengthsif provided)"mean"- Average over all timesteps"max"- Max pooling over all timesteps
Dropout probability applied after each LRNN layer.
Pooling strategy for each layer to reduce sequence length.Options:
"none"- No intermediate pooling"stride"- Strided sampling"mean"- Average pooling"max"- Max pooling
n_layers.Factor by which to reduce sequence length at each layer (when using intermediate pooling).Can be a single integer (same for all layers) or a list of length
n_layers.Size of vocabulary for token embeddings. When provided, the model expects token IDs as input instead of continuous features.
Dimension of token embeddings. Defaults to
d_model if not specified.Maximum sequence length for learned positional embeddings. Only used if positional embeddings are enabled in the embedding layer.
Index of the padding token in the vocabulary (when using token embeddings).
Additional parameters passed to LRNN module constructors.Must include all required constructor arguments for the LRNN class.Examples:
- LRU:
{"d_model": 256, "d_state": 64} - S5:
{"d_model": 256, "d_state": 64, "discretization": "zoh"}
Methods
forward
Input tensor:
- Token IDs of shape
(B, L)when using embeddings - Continuous features of shape
(B, L, input_dim)otherwise
Actual sequence lengths of shape
(B,) for variable-length sequences.Timesteps of shape
(B, L) for LTV models (e.g., Mamba).- Classification: Logits of shape
(B, num_classes) - Regression: Values of shape
(B, output_dim)
Example Usage
Classification with Continuous Features
Classification with Token Embeddings
Regression Task
Hierarchical Pooling
Mixed LRNN Types
Notes
- Set
num_classes > 0for classification tasks,num_classes=0for regression - The
lengthsparameter enables proper handling of variable-length sequences - Intermediate pooling can significantly reduce computation for long sequences
- When using token embeddings, the model automatically handles the embedding and projection layers
- The
lrnn_paramsdict must contain all required parameters for the chosen LRNN class
