Overview
LRU_UNet is a U-Net architecture built with Linear Recurrent Unit (LRU) layers for sequence-to-sequence tasks. It features an encoder-decoder structure with skip connections, using LRU layers for temporal processing and convolutional layers for downsampling/upsampling.
Class Definition
Parameters
Input feature dimension (number of channels).
Hidden state dimension for the LRU layers.
Number of downsampling/upsampling stages in the U-Net.
Downsampling/upsampling factor for each stage. The total downsampling factor is
downsample_factor ** n_layers.Methods
forward
Input sequence of shape
(B, C_in, T) where:Bis batch sizeC_inis number of input channels (must equald_model)Tis sequence length
Processed sequence of shape
(B, C_in, T) with the same dimensions as input.Architecture Details
The U-Net consists of three main components:Encoder (Downsampling Path)
- Each stage contains:
- LRU layer for temporal processing
- Strided Conv1d for downsampling (doubles channels, reduces length by
downsample_factor)
- Skip connections are saved at each stage
Bottleneck
- Single LRU layer at the lowest resolution
Decoder (Upsampling Path)
- Each stage contains:
- ConvTranspose1d for upsampling (halves channels, increases length by
downsample_factor) - LRU layer for temporal processing
- Skip connection addition from encoder
- ConvTranspose1d for upsampling (halves channels, increases length by
Padding Handling
The model automatically pads input sequences to be divisible by the total downsampling factor and crops the output to the original length.Example Usage
Use Cases
- Sequence denoising
- Audio source separation
- Time series forecasting with multi-scale features
- Signal enhancement tasks
- Any sequence-to-sequence task requiring hierarchical feature extraction
Notes
- Input sequences are automatically padded if their length is not divisible by
downsample_factor ** n_layers - The model maintains the original sequence length by cropping padded outputs
- Skip connections help preserve fine-grained temporal information from the encoder
- LRU layers operate on
(B, T, C)format internally, while the model interface uses(B, C, T)format
