RNN
Simple Recurrent Neural Network layer.Constructor
inputSize- Number of input featureshiddenSize- Number of hidden featuresoptions.numLayers- Number of stacked RNN layers (default: 1)options.nonlinearity- Activation function: ‘tanh’ or ‘relu’ (default: ‘tanh’)options.bias- If true, adds bias (default: true)options.batchFirst- If true, input is (batch, seq, features) instead of (seq, batch, features) (default: true)
Formula
h_tis the hidden state at time tx_tis the input at time tW_ih,W_hhare weight matricesb_ih,b_hhare bias vectors
Shape
Input:(batch, seq_len, input_size) if batchFirst=true
or (seq_len, batch, input_size) if batchFirst=false
Output: (batch, seq_len, hidden_size) if batchFirst=true
or (seq_len, batch, hidden_size) if batchFirst=false
Methods
forward
input- Input sequence tensorhx- Initial hidden state (optional)
forwardWithState
Example
LSTM
Long Short-Term Memory layer.Constructor
inputSize- Number of input featureshiddenSize- Number of hidden featuresoptions.numLayers- Number of stacked LSTM layers (default: 1)options.bias- If true, adds bias (default: true)options.batchFirst- If true, input is (batch, seq, features) (default: true)
Formulas
Shape
Input:(batch, seq_len, input_size) if batchFirst=true
Output: (batch, seq_len, hidden_size) if batchFirst=true
Methods
forward
input- Input sequence tensorhx- Initial hidden state (optional)cx- Initial cell state (optional)
forwardWithState
Example
Properties
- Solves vanishing gradient problem
- Remembers long-term dependencies
- More complex than simple RNN
- Gates control information flow
GRU
Gated Recurrent Unit layer.Constructor
inputSize- Number of input featureshiddenSize- Number of hidden featuresoptions.numLayers- Number of stacked GRU layers (default: 1)options.bias- If true, adds bias (default: true)options.batchFirst- If true, input is (batch, seq, features) (default: true)
Formulas
Shape
Input:(batch, seq_len, input_size) if batchFirst=true
Output: (batch, seq_len, hidden_size) if batchFirst=true
Methods
forward
input- Input sequence tensorhx- Initial hidden state (optional)
forwardWithState
Example
Properties
- Simpler than LSTM (fewer parameters)
- Comparable performance to LSTM
- Faster training than LSTM
- Good for shorter sequences
Comparison
| Feature | RNN | LSTM | GRU |
|---|---|---|---|
| Parameters | Fewest | Most | Medium |
| Training Speed | Fastest | Slowest | Fast |
| Long-term Memory | Poor | Excellent | Good |
| Complexity | Simple | Complex | Medium |
| Use Case | Short sequences | Long sequences | General purpose |
Complete Examples
Sequence Classification
Sequence-to-Sequence
Bidirectional RNN
Tips
- Use LSTM/GRU for sequences with long-term dependencies
- Use GRU when training speed is important and sequences are moderate length
- Stack layers with
numLayers > 1for more capacity - Gradient clipping helps prevent exploding gradients
- Batch first is more intuitive:
(batch, seq, features)
See Also
- Attention Layers - For transformer models
- Linear Layer - For output projections
- Activation Functions - tanh, sigmoid