Linear Layers
Linear layers apply affine transformations to input data.Linear
y = xA^T + b
Size of each input sample
Size of each output sample
If
False, the layer will not learn an additive biasDevice to create parameters on
Data type for parameters
- Input:
(*, H_in)whereH_in = in_features - Output:
(*, H_out)whereH_out = out_features
weight: Learnable weights of shape(out_features, in_features)bias: Learnable bias of shape(out_features)ifbias=True
Bilinear
y = x1^T A x2 + b
Size of each first input sample
Size of each second input sample
Size of each output sample
If
False, the layer will not learn an additive biasIdentity
Convolutional Layers
Convolutional layers apply convolution operations over input signals.Conv2d
Number of channels in the input image
Number of channels produced by the convolution
Size of the convolving kernel
Stride of the convolution
Padding added to both sides of the input. Can be
'valid', 'same', or an integerSpacing between kernel elements
Number of blocked connections from input channels to output channels
If
True, adds a learnable bias to the outputPadding mode:
'zeros', 'reflect', 'replicate', or 'circular'- Input:
(N, C_in, H_in, W_in) - Output:
(N, C_out, H_out, W_out)
Conv1d
- Input:
(N, C_in, L_in) - Output:
(N, C_out, L_out)
Conv3d
- Input:
(N, C_in, D_in, H_in, W_in) - Output:
(N, C_out, D_out, H_out, W_out)
Activation Functions
Activation functions introduce non-linearity into neural networks.ReLU
ReLU(x) = max(0, x)
If
True, performs operation in-placeSigmoid
σ(x) = 1 / (1 + exp(-x))
Tanh
GELU
Approximation type:
'none' or 'tanh'LeakyReLU
LeakyReLU(x) = max(0, x) + negative_slope * min(0, x)
Controls the angle of the negative slope
If
True, performs operation in-placeSoftmax
Dimension along which Softmax will be computed
Pooling Layers
MaxPool2d
Size of the window to take max over
Stride of the window
Implicit zero padding to be added on both sides
Parameter that controls the stride of elements in the window
If
True, returns the max indices along with the outputsIf
True, uses ceil instead of floor to compute output shapeAvgPool2d
If
True, includes zero-padding in the averaging calculationIf specified, used as divisor instead of pooling region size
AdaptiveAvgPool2d
Target output size (H_out, W_out)
Normalization Layers
BatchNorm2d
Number of features or channels C from input of size (N, C, H, W)
Value added to denominator for numerical stability
Value used for running mean and variance computation
If
True, module has learnable affine parametersIf
True, tracks running mean and varianceLayerNorm
Input shape from an expected input of size
[* x normalized_shape[0] x normalized_shape[1] x ...]If
True, module has learnable per-element affine parametersDropout Layers
Dropout
p.
Probability of an element to be zeroed
If
True, performs operation in-place