torch.nn Modules

Linear Layers

Linear layers apply affine transformations to input data.

Linear

torch.nn.Linear(in_features, out_features, bias=True, device=None, dtype=None)

Applies an affine linear transformation: y = xA^T + b

in_features

int

required

Size of each input sample

out_features

int

required

Size of each output sample

bias

bool

default:"True"

If False, the layer will not learn an additive bias

device

torch.device

default:"None"

Device to create parameters on

dtype

torch.dtype

default:"None"

Data type for parameters

Shape:

Input: (*, H_in) where H_in = in_features
Output: (*, H_out) where H_out = out_features

Attributes:

weight: Learnable weights of shape (out_features, in_features)
bias: Learnable bias of shape (out_features) if bias=True

Example:

m = nn.Linear(20, 30)
input = torch.randn(128, 20)
output = m(input)
print(output.size())  # torch.Size([128, 30])

Bilinear

torch.nn.Bilinear(in1_features, in2_features, out_features, bias=True, device=None, dtype=None)

Applies a bilinear transformation: y = x1^T A x2 + b

in1_features

int

required

Size of each first input sample

in2_features

int

required

Size of each second input sample

out_features

int

required

Size of each output sample

bias

bool

default:"True"

If False, the layer will not learn an additive bias

Example:

m = nn.Bilinear(20, 30, 40)
input1 = torch.randn(128, 20)
input2 = torch.randn(128, 30)
output = m(input1, input2)
print(output.size())  # torch.Size([128, 40])

Identity

torch.nn.Identity(*args, **kwargs)

A placeholder identity operator that returns input unchanged. Example:

m = nn.Identity()
input = torch.randn(128, 20)
output = m(input)
# output is identical to input

Convolutional Layers

Convolutional layers apply convolution operations over input signals.

Conv2d

torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, 
                padding=0, dilation=1, groups=1, bias=True, 
                padding_mode='zeros', device=None, dtype=None)

Applies a 2D convolution over an input signal.

in_channels

int

required

Number of channels in the input image

out_channels

int

required

Number of channels produced by the convolution

kernel_size

int | tuple

required

Size of the convolving kernel

stride

int | tuple

default:"1"

Stride of the convolution

padding

int | tuple | str

default:"0"

Padding added to both sides of the input. Can be 'valid', 'same', or an integer

dilation

int | tuple

default:"1"

Spacing between kernel elements

groups

int

default:"1"

Number of blocked connections from input channels to output channels

bias

bool

default:"True"

If True, adds a learnable bias to the output

padding_mode

str

default:"'zeros'"

Padding mode: 'zeros', 'reflect', 'replicate', or 'circular'

Shape:

Input: (N, C_in, H_in, W_in)
Output: (N, C_out, H_out, W_out)

Example:

# With square kernels and equal stride
m = nn.Conv2d(16, 33, 3, stride=2)
input = torch.randn(20, 16, 50, 100)
output = m(input)

# Non-square kernels and different stride/padding
m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2))
output = m(input)

Conv1d

torch.nn.Conv1d(in_channels, out_channels, kernel_size, stride=1, 
                padding=0, dilation=1, groups=1, bias=True, 
                padding_mode='zeros', device=None, dtype=None)

Applies a 1D convolution over an input signal. Shape:

Input: (N, C_in, L_in)
Output: (N, C_out, L_out)

Conv3d

torch.nn.Conv3d(in_channels, out_channels, kernel_size, stride=1, 
                padding=0, dilation=1, groups=1, bias=True, 
                padding_mode='zeros', device=None, dtype=None)

Applies a 3D convolution over an input signal. Shape:

Input: (N, C_in, D_in, H_in, W_in)
Output: (N, C_out, D_out, H_out, W_out)

Activation Functions

Activation functions introduce non-linearity into neural networks.

ReLU

torch.nn.ReLU(inplace=False)

Applies the rectified linear unit function: ReLU(x) = max(0, x)

inplace

bool

default:"False"

If True, performs operation in-place

Example:

m = nn.ReLU()
input = torch.randn(2)
output = m(input)

Sigmoid

torch.nn.Sigmoid()

Applies the element-wise sigmoid function: σ(x) = 1 / (1 + exp(-x))

Tanh

torch.nn.Tanh()

Applies the hyperbolic tangent function element-wise.

GELU

torch.nn.GELU(approximate='none')

Applies the Gaussian Error Linear Units function.

approximate

str

default:"'none'"

Approximation type: 'none' or 'tanh'

LeakyReLU

torch.nn.LeakyReLU(negative_slope=0.01, inplace=False)

Applies element-wise: LeakyReLU(x) = max(0, x) + negative_slope * min(0, x)

negative_slope

float

default:"0.01"

Controls the angle of the negative slope

inplace

bool

default:"False"

If True, performs operation in-place

Softmax

torch.nn.Softmax(dim=None)

Applies the Softmax function to an n-dimensional input Tensor.

dim

int

required

Dimension along which Softmax will be computed

Example:

m = nn.Softmax(dim=1)
input = torch.randn(2, 3)
output = m(input)

Pooling Layers

MaxPool2d

torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, 
                   return_indices=False, ceil_mode=False)

Applies 2D max pooling over an input signal.

kernel_size

int | tuple

required

Size of the window to take max over

stride

int | tuple

default:"kernel_size"

Stride of the window

padding

int | tuple

default:"0"

Implicit zero padding to be added on both sides

dilation

int | tuple

default:"1"

Parameter that controls the stride of elements in the window

return_indices

bool

default:"False"

If True, returns the max indices along with the outputs

ceil_mode

bool

default:"False"

If True, uses ceil instead of floor to compute output shape

Example:

m = nn.MaxPool2d(3, stride=2)
input = torch.randn(20, 16, 50, 32)
output = m(input)

AvgPool2d

torch.nn.AvgPool2d(kernel_size, stride=None, padding=0, ceil_mode=False, 
                   count_include_pad=True, divisor_override=None)

Applies 2D average pooling over an input signal.

count_include_pad

bool

default:"True"

If True, includes zero-padding in the averaging calculation

divisor_override

int

default:"None"

If specified, used as divisor instead of pooling region size

AdaptiveAvgPool2d

torch.nn.AdaptiveAvgPool2d(output_size)

Applies 2D adaptive average pooling to produce output of specified size.

output_size

int | tuple

required

Target output size (H_out, W_out)

Example:

m = nn.AdaptiveAvgPool2d((5, 7))
input = torch.randn(1, 64, 8, 9)
output = m(input)
print(output.size())  # torch.Size([1, 64, 5, 7])

Normalization Layers

BatchNorm2d

torch.nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True, 
                     track_running_stats=True, device=None, dtype=None)

Applies Batch Normalization over a 4D input.

num_features

int

required

Number of features or channels C from input of size (N, C, H, W)

eps

float

default:"1e-05"

Value added to denominator for numerical stability

momentum

float

default:"0.1"

Value used for running mean and variance computation

affine

bool

default:"True"

If True, module has learnable affine parameters

track_running_stats

bool

default:"True"

If True, tracks running mean and variance

LayerNorm

torch.nn.LayerNorm(normalized_shape, eps=1e-05, elementwise_affine=True, 
                   bias=True, device=None, dtype=None)

Applies Layer Normalization over a mini-batch of inputs.

normalized_shape

int | tuple

required

Input shape from an expected input of size [* x normalized_shape[0] x normalized_shape[1] x ...]

elementwise_affine

bool

default:"True"

If True, module has learnable per-element affine parameters

Dropout Layers

Dropout

torch.nn.Dropout(p=0.5, inplace=False)

Randomly zeroes some elements of the input tensor with probability p.

float

default:"0.5"

Probability of an element to be zeroed

inplace

bool

default:"False"

If True, performs operation in-place

Example:

m = nn.Dropout(p=0.2)
input = torch.randn(20, 16)
output = m(input)

Container Modules

Sequential

torch.nn.Sequential(*args)

A sequential container for stacking modules. Example:

model = nn.Sequential(
    nn.Conv2d(1, 20, 5),
    nn.ReLU(),
    nn.Conv2d(20, 64, 5),
    nn.ReLU()
)

ModuleList

torch.nn.ModuleList(modules=None)

Holds submodules in a list. Example:

class MyModule(nn.Module):
    def __init__(self):
        super().__init__()
        self.linears = nn.ModuleList([nn.Linear(10, 10) for i in range(10)])

ModuleDict

torch.nn.ModuleDict(modules=None)

Holds submodules in a dictionary. Example:

class MyModule(nn.Module):
    def __init__(self):
        super().__init__()
        self.choices = nn.ModuleDict({
            'conv': nn.Conv2d(10, 10, 3),
            'pool': nn.MaxPool2d(3)
        })

Core

Neural Networks

Optimization

Data

Utilities

Special Modules

torch.nn Modules

Linear Layers

Linear

Bilinear

Identity

Convolutional Layers

Conv2d

Conv1d

Conv3d

Activation Functions

ReLU

Sigmoid

Tanh

GELU

LeakyReLU

Softmax

Pooling Layers

MaxPool2d

AvgPool2d

AdaptiveAvgPool2d

Normalization Layers

BatchNorm2d

LayerNorm

Dropout Layers

Dropout

Container Modules

Sequential

ModuleList

ModuleDict

Build docs developers (and LLMs) love

Core

Neural Networks

Optimization

Data

Utilities

Special Modules

​Linear Layers

​Linear

​Bilinear

​Identity

​Convolutional Layers

​Conv2d

​Conv1d

​Conv3d

​Activation Functions

​ReLU

​Sigmoid

​Tanh

​GELU

​LeakyReLU

​Softmax

​Pooling Layers

​MaxPool2d

​AvgPool2d

​AdaptiveAvgPool2d

​Normalization Layers

​BatchNorm2d

​LayerNorm

​Dropout Layers

​Dropout

​Container Modules

​Sequential

​ModuleList

​ModuleDict

Build docs developers (and LLMs) love

Linear Layers

Linear

Bilinear

Identity

Convolutional Layers

Conv2d

Conv1d

Conv3d

Activation Functions

ReLU

Sigmoid

Tanh

GELU

LeakyReLU

Softmax

Pooling Layers

MaxPool2d

AvgPool2d

AdaptiveAvgPool2d

Normalization Layers

BatchNorm2d

LayerNorm

Dropout Layers

Dropout

Container Modules

Sequential

ModuleList

ModuleDict