Skip to main content

Linear Layers

Linear layers apply affine transformations to input data.

Linear

torch.nn.Linear(in_features, out_features, bias=True, device=None, dtype=None)
Applies an affine linear transformation: y = xA^T + b
in_features
int
required
Size of each input sample
out_features
int
required
Size of each output sample
bias
bool
default:"True"
If False, the layer will not learn an additive bias
device
torch.device
default:"None"
Device to create parameters on
dtype
torch.dtype
default:"None"
Data type for parameters
Shape:
  • Input: (*, H_in) where H_in = in_features
  • Output: (*, H_out) where H_out = out_features
Attributes:
  • weight: Learnable weights of shape (out_features, in_features)
  • bias: Learnable bias of shape (out_features) if bias=True
Example:
m = nn.Linear(20, 30)
input = torch.randn(128, 20)
output = m(input)
print(output.size())  # torch.Size([128, 30])

Bilinear

torch.nn.Bilinear(in1_features, in2_features, out_features, bias=True, device=None, dtype=None)
Applies a bilinear transformation: y = x1^T A x2 + b
in1_features
int
required
Size of each first input sample
in2_features
int
required
Size of each second input sample
out_features
int
required
Size of each output sample
bias
bool
default:"True"
If False, the layer will not learn an additive bias
Example:
m = nn.Bilinear(20, 30, 40)
input1 = torch.randn(128, 20)
input2 = torch.randn(128, 30)
output = m(input1, input2)
print(output.size())  # torch.Size([128, 40])

Identity

torch.nn.Identity(*args, **kwargs)
A placeholder identity operator that returns input unchanged. Example:
m = nn.Identity()
input = torch.randn(128, 20)
output = m(input)
# output is identical to input

Convolutional Layers

Convolutional layers apply convolution operations over input signals.

Conv2d

torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, 
                padding=0, dilation=1, groups=1, bias=True, 
                padding_mode='zeros', device=None, dtype=None)
Applies a 2D convolution over an input signal.
in_channels
int
required
Number of channels in the input image
out_channels
int
required
Number of channels produced by the convolution
kernel_size
int | tuple
required
Size of the convolving kernel
stride
int | tuple
default:"1"
Stride of the convolution
padding
int | tuple | str
default:"0"
Padding added to both sides of the input. Can be 'valid', 'same', or an integer
dilation
int | tuple
default:"1"
Spacing between kernel elements
groups
int
default:"1"
Number of blocked connections from input channels to output channels
bias
bool
default:"True"
If True, adds a learnable bias to the output
padding_mode
str
default:"'zeros'"
Padding mode: 'zeros', 'reflect', 'replicate', or 'circular'
Shape:
  • Input: (N, C_in, H_in, W_in)
  • Output: (N, C_out, H_out, W_out)
Example:
# With square kernels and equal stride
m = nn.Conv2d(16, 33, 3, stride=2)
input = torch.randn(20, 16, 50, 100)
output = m(input)

# Non-square kernels and different stride/padding
m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2))
output = m(input)

Conv1d

torch.nn.Conv1d(in_channels, out_channels, kernel_size, stride=1, 
                padding=0, dilation=1, groups=1, bias=True, 
                padding_mode='zeros', device=None, dtype=None)
Applies a 1D convolution over an input signal. Shape:
  • Input: (N, C_in, L_in)
  • Output: (N, C_out, L_out)

Conv3d

torch.nn.Conv3d(in_channels, out_channels, kernel_size, stride=1, 
                padding=0, dilation=1, groups=1, bias=True, 
                padding_mode='zeros', device=None, dtype=None)
Applies a 3D convolution over an input signal. Shape:
  • Input: (N, C_in, D_in, H_in, W_in)
  • Output: (N, C_out, D_out, H_out, W_out)

Activation Functions

Activation functions introduce non-linearity into neural networks.

ReLU

torch.nn.ReLU(inplace=False)
Applies the rectified linear unit function: ReLU(x) = max(0, x)
inplace
bool
default:"False"
If True, performs operation in-place
Example:
m = nn.ReLU()
input = torch.randn(2)
output = m(input)

Sigmoid

torch.nn.Sigmoid()
Applies the element-wise sigmoid function: σ(x) = 1 / (1 + exp(-x))

Tanh

torch.nn.Tanh()
Applies the hyperbolic tangent function element-wise.

GELU

torch.nn.GELU(approximate='none')
Applies the Gaussian Error Linear Units function.
approximate
str
default:"'none'"
Approximation type: 'none' or 'tanh'

LeakyReLU

torch.nn.LeakyReLU(negative_slope=0.01, inplace=False)
Applies element-wise: LeakyReLU(x) = max(0, x) + negative_slope * min(0, x)
negative_slope
float
default:"0.01"
Controls the angle of the negative slope
inplace
bool
default:"False"
If True, performs operation in-place

Softmax

torch.nn.Softmax(dim=None)
Applies the Softmax function to an n-dimensional input Tensor.
dim
int
required
Dimension along which Softmax will be computed
Example:
m = nn.Softmax(dim=1)
input = torch.randn(2, 3)
output = m(input)

Pooling Layers

MaxPool2d

torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, 
                   return_indices=False, ceil_mode=False)
Applies 2D max pooling over an input signal.
kernel_size
int | tuple
required
Size of the window to take max over
stride
int | tuple
default:"kernel_size"
Stride of the window
padding
int | tuple
default:"0"
Implicit zero padding to be added on both sides
dilation
int | tuple
default:"1"
Parameter that controls the stride of elements in the window
return_indices
bool
default:"False"
If True, returns the max indices along with the outputs
ceil_mode
bool
default:"False"
If True, uses ceil instead of floor to compute output shape
Example:
m = nn.MaxPool2d(3, stride=2)
input = torch.randn(20, 16, 50, 32)
output = m(input)

AvgPool2d

torch.nn.AvgPool2d(kernel_size, stride=None, padding=0, ceil_mode=False, 
                   count_include_pad=True, divisor_override=None)
Applies 2D average pooling over an input signal.
count_include_pad
bool
default:"True"
If True, includes zero-padding in the averaging calculation
divisor_override
int
default:"None"
If specified, used as divisor instead of pooling region size

AdaptiveAvgPool2d

torch.nn.AdaptiveAvgPool2d(output_size)
Applies 2D adaptive average pooling to produce output of specified size.
output_size
int | tuple
required
Target output size (H_out, W_out)
Example:
m = nn.AdaptiveAvgPool2d((5, 7))
input = torch.randn(1, 64, 8, 9)
output = m(input)
print(output.size())  # torch.Size([1, 64, 5, 7])

Normalization Layers

BatchNorm2d

torch.nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True, 
                     track_running_stats=True, device=None, dtype=None)
Applies Batch Normalization over a 4D input.
num_features
int
required
Number of features or channels C from input of size (N, C, H, W)
eps
float
default:"1e-05"
Value added to denominator for numerical stability
momentum
float
default:"0.1"
Value used for running mean and variance computation
affine
bool
default:"True"
If True, module has learnable affine parameters
track_running_stats
bool
default:"True"
If True, tracks running mean and variance

LayerNorm

torch.nn.LayerNorm(normalized_shape, eps=1e-05, elementwise_affine=True, 
                   bias=True, device=None, dtype=None)
Applies Layer Normalization over a mini-batch of inputs.
normalized_shape
int | tuple
required
Input shape from an expected input of size [* x normalized_shape[0] x normalized_shape[1] x ...]
elementwise_affine
bool
default:"True"
If True, module has learnable per-element affine parameters

Dropout Layers

Dropout

torch.nn.Dropout(p=0.5, inplace=False)
Randomly zeroes some elements of the input tensor with probability p.
p
float
default:"0.5"
Probability of an element to be zeroed
inplace
bool
default:"False"
If True, performs operation in-place
Example:
m = nn.Dropout(p=0.2)
input = torch.randn(20, 16)
output = m(input)

Container Modules

Sequential

torch.nn.Sequential(*args)
A sequential container for stacking modules. Example:
model = nn.Sequential(
    nn.Conv2d(1, 20, 5),
    nn.ReLU(),
    nn.Conv2d(20, 64, 5),
    nn.ReLU()
)

ModuleList

torch.nn.ModuleList(modules=None)
Holds submodules in a list. Example:
class MyModule(nn.Module):
    def __init__(self):
        super().__init__()
        self.linears = nn.ModuleList([nn.Linear(10, 10) for i in range(10)])

ModuleDict

torch.nn.ModuleDict(modules=None)
Holds submodules in a dictionary. Example:
class MyModule(nn.Module):
    def __init__(self):
        super().__init__()
        self.choices = nn.ModuleDict({
            'conv': nn.Conv2d(10, 10, 3),
            'pool': nn.MaxPool2d(3)
        })

Build docs developers (and LLMs) love