Skip to main content

Overview

The DiffusionProcess class implements the complete diffusion process for image generation, including noise scheduling, forward diffusion (adding noise), reverse diffusion (sampling), and training. It uses a cosine beta schedule and supports both DDPM and DDIM sampling strategies.

Constructor

DiffusionProcess(
    image_size,
    channels,
    hidden_dims=[32, 64, 128],
    beta_start=1e-4,
    beta_end=0.02,
    noise_steps=1000,
    device=torch.device('cuda' if torch.cuda.is_available() else 'cpu')
)

Parameters

image_size
int
required
Height and width of the square input images.
channels
int
required
Number of image channels (e.g., 1 for grayscale, 3 for RGB).
hidden_dims
list[int]
default:"[32, 64, 128]"
List of hidden dimensions for each level of the U-Net encoder/decoder. The length determines the number of downsampling/upsampling blocks.
beta_start
float
default:"1e-4"
Initial noise variance in the noise schedule. Lower values mean less noise at the beginning of the diffusion process.
beta_end
float
default:"0.02"
Final noise variance in the noise schedule. This determines the maximum noise level at the final diffusion step.
noise_steps
int
default:"1000"
Total number of diffusion timesteps. More steps provide smoother transitions but slower sampling.
device
torch.device
Device to run computations on (CPU or CUDA GPU).

Attributes

After initialization, the following attributes are available:
beta_schedule
torch.Tensor
Cosine beta schedule tensor of shape [noise_steps] defining noise variance at each timestep.
alpha_schedule
torch.Tensor
Alpha values computed as 1.0 - beta_schedule.
alpha_cumprod
torch.Tensor
Cumulative product of alpha values, used in the forward diffusion equation.
sqrt_alpha_cumprod
torch.Tensor
Square root of alpha_cumprod, precomputed for efficiency.
sqrt_one_minus_alpha_cumprod
torch.Tensor
Square root of 1 - alpha_cumprod, precomputed for efficiency.
model
DiffusionModel
The U-Net model used for noise prediction.
optimizer
torch.optim.Adam
Adam optimizer with learning rate 1e-4.
grad_scaler
torch.amp.GradScaler
Gradient scaler for mixed precision training (CUDA only).

Methods

add_noise

Add noise to clean images according to the forward diffusion process.
def add_noise(self, x, t)

Parameters

x
torch.Tensor
required
Clean images tensor of shape [batch_size, channels, height, width].
t
torch.Tensor
required
Timesteps tensor of shape [batch_size] containing integer timestep indices.

Returns

noisy_images
torch.Tensor
Noisy images at timestep t, shape [batch_size, channels, height, width].
noise
torch.Tensor
The Gaussian noise that was added, shape [batch_size, channels, height, width].

Implementation

Uses the forward diffusion equation:
x_t = sqrt(alpha_cumprod_t) * x + sqrt(1 - alpha_cumprod_t) * noise

sample

Generate new samples using DDPM reverse diffusion.
def sample(self, num_samples=16)

Parameters

num_samples
int
default:"16"
Number of images to generate.

Returns

samples
torch.Tensor
Generated images tensor of shape [num_samples, channels, image_size, image_size], values clamped to [-1, 1].

Implementation

Starts with random Gaussian noise and iteratively denoises over noise_steps timesteps in reverse order. Uses the DDPM sampling algorithm with predicted noise to compute the mean and variance at each step.

sample_ddim

Generate samples using DDIM (Denoising Diffusion Implicit Models) for faster sampling.
def sample_ddim(self, num_samples=16, ddim_steps=50, eta=0.0)

Parameters

num_samples
int
default:"16"
Number of images to generate.
ddim_steps
int
default:"50"
Number of denoising steps. Fewer steps mean faster sampling. Must be in range (0, noise_steps].
eta
float
default:"0.0"
Stochasticity parameter. eta=0 produces deterministic DDIM sampling, eta=1 recovers DDPM behavior.

Returns

samples
torch.Tensor
Generated images tensor of shape [num_samples, channels, image_size, image_size], values clamped to [-1, 1].

Raises

  • ValueError: If ddim_steps is not in the valid range.

Implementation

Based on “Denoising Diffusion Implicit Models” (Song et al., 2020). Allows faster sampling by skipping timesteps while maintaining quality. The deterministic variant (eta=0) produces consistent outputs for the same noise input.

train_step

Perform one training step for the diffusion model.
def train_step(self, x)

Parameters

x
torch.Tensor
required
Clean images tensor of shape [batch_size, channels, height, width].

Returns

loss
float
MSE loss value between predicted noise and actual noise.

Implementation

  1. Samples random timesteps for each image in the batch
  2. Adds noise to images using add_noise()
  3. Predicts noise using the U-Net model
  4. Computes MSE loss between predicted and actual noise
  5. Performs backpropagation with optional mixed precision (AMP)
  6. Updates model parameters via the optimizer

Usage example

import torch
from models.diffusion import DiffusionProcess

# Initialize diffusion process for 28x28 grayscale images
diffusion = DiffusionProcess(
    image_size=28,
    channels=1,
    hidden_dims=[32, 64, 128],
    beta_start=1e-4,
    beta_end=0.02,
    noise_steps=1000
)

# Training loop
for epoch in range(num_epochs):
    for batch in dataloader:
        images = batch[0]  # Shape: [batch_size, 1, 28, 28]
        loss = diffusion.train_step(images)
        print(f"Loss: {loss:.4f}")

# Generate samples using DDPM
samples = diffusion.sample(num_samples=16)

# Generate samples using DDIM (faster)
samples_ddim = diffusion.sample_ddim(
    num_samples=16,
    ddim_steps=50,
    eta=0.0
)

Build docs developers (and LLMs) love