Linear
Applies a linear transformation to the incoming data:y = xA^T + b
Also known as a fully connected layer or dense layer.
Constructor
inFeatures- Size of each input sampleoutFeatures- Size of each output sampleoptions.bias- If true, adds learnable bias (default: true)options.dtype- Data type for weights (default: ‘float32’)options.device- Device to place tensors on (default: ‘cpu’)
InvalidParameterError- If dimensions are invalid
Mathematical Formulation
xis the input tensor of shape(*, in_features)Wis the weight matrix of shape(out_features, in_features)bis the bias vector of shape(out_features,)yis the output tensor of shape(*, out_features)
Shape Conventions
Input:(*, in_features) where * means any number of leading dimensions
- 1D:
(in_features)→ Output:(out_features) - 2D:
(batch, in_features)→ Output:(batch, out_features) - 3D:
(batch, seq_len, in_features)→ Output:(batch, seq_len, out_features)
(*, out_features) - all leading dimensions are preserved
Attributes
weight- Learnable weights of shape(out_features, in_features)bias- Learnable bias of shape(out_features,)ifbias=true
Initialization
Weights are initialized using Kaiming/He initialization:weights ~ N(0, sqrt(2/in_features))- Biases are initialized to zeros
Properties
inputSize: number- Number of input featuresoutputSize: number- Number of output features
Methods
forward
y = x * W^T + b.
Parameters:
input- Input tensor of shape(*, in_features)
(*, out_features)
Throws:
ShapeError- If input shape is invalidDTypeError- If input dtype is unsupported
getWeight
(out_features, in_features)
getBias
(out_features,) or undefined if no bias
Examples
Basic Usage
Without Bias
With Sequence Data
Building a Multi-Layer Perceptron
Training Example
Performance Considerations
-
Batch Processing: Process multiple samples together for better performance:
- Memory Layout: Inputs are reshaped internally for efficient matrix multiplication. Contiguous tensors perform better.
-
Data Type: Use
float32(default) unless you need the precision offloat64. Float32 is faster and uses less memory.
Common Patterns
Residual Connections
Projection Layers
See Also
- Module - Base class for all neural network modules
- Sequential - Container for stacking layers
- Activation Functions - Non-linear activation functions
- Optimizers - For training linear layers