Welcome to MaxDiffusion
MaxDiffusion is a collection of reference implementations of various latent diffusion models written in pure Python/JAX that run on XLA devices including Cloud TPUs and GPUs. MaxDiffusion aims to be a launching off point for ambitious diffusion projects both in research and production.Quickstart
Generate your first image in minutes
Training
Train and fine-tune diffusion models
Inference
Generate images and videos at scale
API Reference
Explore the complete API
Supported models
MaxDiffusion provides production-ready implementations for:- Stable Diffusion 1.x, 2.x, and XL - Training and inference
- Flux Dev and Schnell - Training and inference with LoRA support
- Wan 2.1/2.2 - Text-to-video and image-to-video generation
- LTX-Video - Text-to-video and image-to-video generation
- ControlNet - Spatial conditioning for SD 1.4 and SDXL
- Dreambooth - Personalized fine-tuning for SD 1.x and 2.x
Key features
Multi-LoRA loading
Load and blend multiple LoRA adapters for inference
Flash attention
Optimized attention kernels for TPU and GPU
Distributed training
FSDP, data, and tensor parallelism for TPU Pods
Mixed precision
bfloat16 training with configurable precision
Why MaxDiffusion?
Built for XLA devices
MaxDiffusion is designed from the ground up for Google Cloud TPUs and GPUs, with extensive optimizations for:- TPU v5p and v6e (Trillium) - Optimized flash attention block sizes and LIBTPU flags
- Multi-host training - Scale to hundreds of TPU chips with XPK
- Efficient memory usage - Gradient checkpointing and offloading strategies
Production ready
- Pure JAX implementation - Full XLA compilation for maximum performance
- HuggingFace compatibility - Load and save models in Diffusers format
- Orbax checkpointing - Efficient distributed checkpointing
- Comprehensive configuration - YAML-based config system
Research friendly
- Modular architecture - Easy to fork and modify
- Latest models - Flux, Wan 2.1/2.2, LTX-Video support
- Advanced features - LoRA, quantization, custom schedulers
MaxDiffusion started as a fork of HuggingFace Diffusers and maintains compatibility with HuggingFace models and pipelines.
Hardware support
- TPU
- GPU
Recommended for production
- TPU v4, v5p, v6e (Trillium)
- Single host and multi-host configurations
- Flash attention optimized for TPU architecture
- Async collectives for distributed training
What’s new?
- 2026/01/29: Wan LoRA for inference is now supported
- 2026/01/15: Wan2.1 and Wan2.2 Img2vid generation is now supported
- 2025/11/11: Wan2.2 txt2vid generation is now supported
- 2025/10/10: Wan2.1 txt2vid training and generation is now supported
- 2025/10/14: NVIDIA DGX Spark Flux support
- 2025/08/14: LTX-Video img2vid generation is now supported
- 2025/07/29: LTX-Video text2vid generation is now supported
- 2025/04/17: Flux Finetuning
- 2025/02/12: Flux LoRA for inference
- 2025/02/08: Flux schnell & dev inference
Next steps
Installation
Set up MaxDiffusion on TPU or GPU
Quickstart
Generate your first image
Training guide
Learn how to train models
Deployment
Deploy at scale with XPK