Skip to main content

Welcome to MaxDiffusion

MaxDiffusion is a collection of reference implementations of various latent diffusion models written in pure Python/JAX that run on XLA devices including Cloud TPUs and GPUs. MaxDiffusion aims to be a launching off point for ambitious diffusion projects both in research and production.

Quickstart

Generate your first image in minutes

Training

Train and fine-tune diffusion models

Inference

Generate images and videos at scale

API Reference

Explore the complete API

Supported models

MaxDiffusion provides production-ready implementations for:
  • Stable Diffusion 1.x, 2.x, and XL - Training and inference
  • Flux Dev and Schnell - Training and inference with LoRA support
  • Wan 2.1/2.2 - Text-to-video and image-to-video generation
  • LTX-Video - Text-to-video and image-to-video generation
  • ControlNet - Spatial conditioning for SD 1.4 and SDXL
  • Dreambooth - Personalized fine-tuning for SD 1.x and 2.x

Key features

Multi-LoRA loading

Load and blend multiple LoRA adapters for inference

Flash attention

Optimized attention kernels for TPU and GPU

Distributed training

FSDP, data, and tensor parallelism for TPU Pods

Mixed precision

bfloat16 training with configurable precision

Why MaxDiffusion?

Built for XLA devices

MaxDiffusion is designed from the ground up for Google Cloud TPUs and GPUs, with extensive optimizations for:
  • TPU v5p and v6e (Trillium) - Optimized flash attention block sizes and LIBTPU flags
  • Multi-host training - Scale to hundreds of TPU chips with XPK
  • Efficient memory usage - Gradient checkpointing and offloading strategies

Production ready

  • Pure JAX implementation - Full XLA compilation for maximum performance
  • HuggingFace compatibility - Load and save models in Diffusers format
  • Orbax checkpointing - Efficient distributed checkpointing
  • Comprehensive configuration - YAML-based config system

Research friendly

  • Modular architecture - Easy to fork and modify
  • Latest models - Flux, Wan 2.1/2.2, LTX-Video support
  • Advanced features - LoRA, quantization, custom schedulers
MaxDiffusion started as a fork of HuggingFace Diffusers and maintains compatibility with HuggingFace models and pipelines.

Hardware support

Recommended for production
  • TPU v4, v5p, v6e (Trillium)
  • Single host and multi-host configurations
  • Flash attention optimized for TPU architecture
  • Async collectives for distributed training

What’s new?

  • 2026/01/29: Wan LoRA for inference is now supported
  • 2026/01/15: Wan2.1 and Wan2.2 Img2vid generation is now supported
  • 2025/11/11: Wan2.2 txt2vid generation is now supported
  • 2025/10/10: Wan2.1 txt2vid training and generation is now supported
  • 2025/10/14: NVIDIA DGX Spark Flux support
  • 2025/08/14: LTX-Video img2vid generation is now supported
  • 2025/07/29: LTX-Video text2vid generation is now supported
  • 2025/04/17: Flux Finetuning
  • 2025/02/12: Flux LoRA for inference
  • 2025/02/08: Flux schnell & dev inference

Next steps

Installation

Set up MaxDiffusion on TPU or GPU

Quickstart

Generate your first image

Training guide

Learn how to train models

Deployment

Deploy at scale with XPK

Build docs developers (and LLMs) love