DDIM (Denoising Diffusion Implicit Models) enables faster sampling by skipping timesteps while maintaining high sample quality. Unlike DDPM, DDIM can produce deterministic samples when η = 0 \eta = 0 η = 0 .
Why DDIM?
Standard DDPM sampling requires iterating through all T timesteps (e.g., 1000 steps), making generation slow. DDIM addresses this by:
Fewer steps : Use only 50-100 steps instead of 1000
Deterministic : Same initial noise produces same output when η = 0 \eta = 0 η = 0
Quality preservation : Maintains sample quality with proper step selection
DDIM uses a non-Markovian forward process that allows skipping timesteps. The reverse update is:
x t − 1 = α ˉ t − 1 ⋅ pred x 0 + 1 − α ˉ t − 1 − σ t 2 ⋅ ϵ θ ( x t , t ) + σ t ϵ x_{t-1} = \sqrt{\bar{\alpha}_{t-1}} \cdot \text{pred}_{x_0} + \sqrt{1 - \bar{\alpha}_{t-1} - \sigma_t^2} \cdot \epsilon_\theta(x_t, t) + \sigma_t \epsilon x t − 1 = α ˉ t − 1 ⋅ pred x 0 + 1 − α ˉ t − 1 − σ t 2 ⋅ ϵ θ ( x t , t ) + σ t ϵ
where:
pred x 0 = x t − 1 − α ˉ t ⋅ ϵ θ ( x t , t ) α ˉ t \text{pred}_{x_0} = \frac{x_t - \sqrt{1-\bar{\alpha}_t} \cdot \epsilon_\theta(x_t, t)}{\sqrt{\bar{\alpha}_t}} pred x 0 = α ˉ t x t − 1 − α ˉ t ⋅ ϵ θ ( x t , t ) is the predicted clean image
σ t = η ⋅ 1 − α ˉ t − 1 1 − α ˉ t ⋅ 1 − α ˉ t α ˉ t − 1 \sigma_t = \eta \cdot \sqrt{\frac{1-\bar{\alpha}_{t-1}}{1-\bar{\alpha}_t} \cdot \frac{1-\bar{\alpha}_t}{\bar{\alpha}_{t-1}}} σ t = η ⋅ 1 − α ˉ t 1 − α ˉ t − 1 ⋅ α ˉ t − 1 1 − α ˉ t controls stochasticity
ϵ ∼ N ( 0 , I ) \epsilon \sim \mathcal{N}(0, I) ϵ ∼ N ( 0 , I ) is random noise (only if η > 0 \eta > 0 η > 0 )
When η = 0 \eta = 0 η = 0 , DDIM is fully deterministic. When η = 1 \eta = 1 η = 1 , DDIM recovers the DDPM sampling process.
Implementation
Here’s the complete DDIM implementation from src/models/diffusion.py:122:
def sample_ddim ( self , num_samples = 16 , ddim_steps = 50 , eta = 0.0 ):
"""
Generate samples using DDIM (Denoising Diffusion Implicit Models).
DDIM allows faster sampling by skipping timesteps while maintaining quality.
Based on "Denoising Diffusion Implicit Models" (Song et al., 2020).
Args:
num_samples: Number of samples to generate
ddim_steps: Number of denoising steps (fewer = faster, original uses noise_steps)
eta: Stochasticity parameter. eta=0 is deterministic DDIM, eta=1 recovers DDPM
Returns:
Generated images tensor
Pre: ddim_steps > 0 and ddim_steps <= noise_steps
Post: returns tensor of shape (num_samples, channels, image_size, image_size)
"""
if ddim_steps <= 0 or ddim_steps > self .noise_steps:
raise ValueError ( f "ddim_steps must be in (0, { self .noise_steps } ], got { ddim_steps } " )
self .model.eval()
with torch.no_grad():
# Create uniform timestep schedule
step_size = self .noise_steps // ddim_steps
timesteps = list ( range ( 0 , self .noise_steps, step_size))
if timesteps[ - 1 ] != self .noise_steps - 1 :
timesteps.append( self .noise_steps - 1 )
timesteps = sorted (timesteps, reverse = True )
# Start with random noise
x_t = torch.randn(num_samples, self .model.channels,
self .model.image_size, self .model.image_size,
device = self .device)
for i, t in enumerate (timesteps):
t_batch = torch.full((num_samples,), t, device = self .device, dtype = torch.long)
# Predict noise
predicted_noise = self .model(x_t, t_batch)
# Get schedule values
alpha_cumprod_t = self .alpha_cumprod[t]
sqrt_alpha_cumprod_t = self .sqrt_alpha_cumprod[t]
sqrt_one_minus_alpha_cumprod_t = self .sqrt_one_minus_alpha_cumprod[t]
# Predict x_0
pred_x0 = (x_t - sqrt_one_minus_alpha_cumprod_t * predicted_noise) / sqrt_alpha_cumprod_t
pred_x0 = torch.clamp(pred_x0, - 1.0 , 1.0 )
if i < len (timesteps) - 1 :
t_prev = timesteps[i + 1 ]
alpha_cumprod_t_prev = self .alpha_cumprod[t_prev]
sqrt_alpha_cumprod_t_prev = self .sqrt_alpha_cumprod[t_prev]
sqrt_one_minus_alpha_cumprod_t_prev = self .sqrt_one_minus_alpha_cumprod[t_prev]
# Compute variance
sigma_t = eta * torch.sqrt(
( 1 - alpha_cumprod_t_prev) / ( 1 - alpha_cumprod_t) *
( 1 - alpha_cumprod_t / alpha_cumprod_t_prev)
)
# Direction pointing to x_t
dir_xt = torch.sqrt( 1 - alpha_cumprod_t_prev - sigma_t ** 2 ) * predicted_noise
# Compute x_{t-1}
x_t = sqrt_alpha_cumprod_t_prev * pred_x0 + dir_xt
# Add stochastic noise if eta > 0
if eta > 0 :
noise = torch.randn_like(x_t)
x_t = x_t + sigma_t * noise
else :
x_t = pred_x0
# Clamp only at the end
result = torch.clamp(x_t, - 1.0 , 1.0 )
self .model.train()
return result
Usage example
Set up diffusion model
Load a trained model (same as DDPM): from src.models.diffusion import DiffusionProcess
import torch
device = torch.device( 'cuda' if torch.cuda.is_available() else 'cpu' )
diffusion = DiffusionProcess(
image_size = 28 ,
channels = 1 ,
hidden_dims = [ 128 , 256 , 512 ],
noise_steps = 1000 ,
device = device
)
diffusion.model.load_state_dict(torch.load( 'best_model.pt' ))
Generate samples with DDIM
Use fewer steps for faster generation: # Deterministic sampling with 50 steps (20x faster than DDPM)
samples = diffusion.sample_ddim(
num_samples = 16 ,
ddim_steps = 50 ,
eta = 0.0 # Fully deterministic
)
Experiment with stochasticity
Adjust the eta parameter to control randomness: # More stochastic (closer to DDPM)
samples = diffusion.sample_ddim(
num_samples = 16 ,
ddim_steps = 50 ,
eta = 0.5 # Partially stochastic
)
CIFAR-10 DDIM implementation
The CIFAR-10 variant uses uniform timestep spacing for better coverage. From src/models/diffusion_cifar.py:375:
def sample_ddim ( self , num_samples = 16 , ddim_steps = 50 , eta = 0.0 ):
"""
Generate samples using DDIM with EMA parameters.
DDIM chooses a sparse subsequence of timesteps t_0 > … > t_{S-1}
and follows a deterministic trajectory when η = 0.
"""
if ddim_steps <= 0 or ddim_steps > self .noise_steps:
raise ValueError ( f "ddim_steps must be in (0, { self .noise_steps } ], got { ddim_steps } " )
model = self .ema_model # Use EMA weights
was_training = model.training
model.eval()
with torch.no_grad():
# Uniform grid of timesteps in [0, T-1], highest to lowest
step_indices = torch.linspace(
0 ,
self .noise_steps - 1 ,
steps = ddim_steps,
dtype = torch.long,
device = self .device,
)
timesteps = list ( reversed (step_indices.tolist()))
x_t = torch.randn(
num_samples,
self .model.channels,
self .model.image_size,
self .model.image_size,
device = self .device,
)
for i, t in enumerate (timesteps):
t_batch = torch.full((num_samples,), t, device = self .device, dtype = torch.long)
eps_pred = model(x_t, t_batch)
alpha_cumprod_t = self .alpha_cumprod[t]
sqrt_alpha_cumprod_t = self .sqrt_alpha_cumprod[t]
sqrt_one_minus_alpha_cumprod_t = self .sqrt_one_minus_alpha_cumprod[t]
pred_x0 = (x_t - sqrt_one_minus_alpha_cumprod_t * eps_pred) / sqrt_alpha_cumprod_t
if i < len (timesteps) - 1 :
t_prev = timesteps[i + 1 ]
alpha_cumprod_t_prev = self .alpha_cumprod[t_prev]
sqrt_alpha_cumprod_t_prev = self .sqrt_alpha_cumprod[t_prev]
sqrt_one_minus_alpha_cumprod_t_prev = self .sqrt_one_minus_alpha_cumprod[t_prev]
sigma_t = eta * torch.sqrt(
( 1 - alpha_cumprod_t_prev) / ( 1 - alpha_cumprod_t)
* ( 1 - alpha_cumprod_t / alpha_cumprod_t_prev)
)
# Direction term along the predicted noise
dir_xt = torch.sqrt(
1 - alpha_cumprod_t_prev - sigma_t ** 2
) * eps_pred
x_t = sqrt_alpha_cumprod_t_prev * pred_x0 + dir_xt
if eta > 0 :
noise = torch.randn_like(x_t)
x_t = x_t + sigma_t * noise
else :
x_t = pred_x0
# Final clamp to the valid image range
x_t = torch.clamp(x_t, - 1.0 , 1.0 )
if was_training:
model.train()
return x_t
The CIFAR-10 implementation uses torch.linspace for uniform timestep spacing, while the base implementation uses integer division with step_size.
Method Steps Time Quality DDPM 1000 ~10s Excellent DDIM (50 steps) 50 ~0.5s Very good DDIM (100 steps) 100 ~1s Excellent
Start with ddim_steps=50 and eta=0.0 for a good balance of speed and quality. Increase steps if you need higher quality, or use eta=0.3 for slightly more diverse samples.
Key advantages
Speed : 10-20x faster than DDPM with 50-100 steps
Deterministic : Reproducible results when η = 0 \eta = 0 η = 0
Flexible : Can trade off speed vs quality by adjusting steps
Interpolation : Deterministic trajectories enable meaningful latent space interpolation