Overview
Thegenerate_wan.py script provides inference for Wan models (Wan2.1 and Wan2.2), supporting both text-to-video (T2V) and image-to-video (I2V) generation. Wan models are state-of-the-art video diffusion models capable of generating high-quality, temporally consistent videos.
Usage
Wan2.1 text-to-video
Wan2.1 image-to-video
Wan2.2 text-to-video
With LoRA
Configuration parameters
Model parameters
Model path. Options:
Wan-AI/Wan2.1-T2V-14B-Diffusers(Wan2.1 T2V)Wan-AI/Wan2.1-I2V-14B-Diffusers(Wan2.1 I2V)Wan-AI/Wan2.2-T2V-27B-Diffusers(Wan2.2 T2V)Wan-AI/Wan2.2-I2V-27B-Diffusers(Wan2.2 I2V)
Model variant. Options:
wan2.1, wan2.2.Pipeline type. Options:
T2V (text-to-video), I2V (image-to-video).Override path for transformer weights.
Generation parameters
Text prompt describing the video to generate.Example:
Negative prompt to guide what should not appear in the video.Example:
Path to input image for I2V generation. Required when
model_type=I2V.Video height in pixels. Common values: 480, 720, 1080.
Video width in pixels. Common values: 832, 1280, 1920.
Number of frames to generate. Must be compatible with temporal compression.
Number of denoising steps. More steps = better quality but slower.
Frames per second for output video.
Wan2.1 guidance parameters
Classifier-free guidance scale for Wan2.1 models.
Wan2.2 guidance parameters
Low-frequency guidance scale for Wan2.2 models (dual guidance system).
High-frequency guidance scale for Wan2.2 models (dual guidance system).
Flow parameters
Flow shift parameter for rectified flow. Controls the noise schedule.
Performance parameters
Data type for model weights. Use
bfloat16 for optimal performance.Data type for activations.
Attention implementation. Options:
flash, cudnn_flash_te (GPU), ring, dot_product.Use jax.lax.scan for transformer layers to reduce memory.
Replicate VAE across devices instead of sharding.
Flash attention block sizes. Different optimal values for v5p and v6e:v6e (Trillium):
Minimum sequence length for flash attention.
Parallelism parameters
Data parallelism across ICI devices.
FSDP parallelism. Used for sequence parallelism in Wan2.1. Values of 2 or 4 work best. The sequence length is padded to be evenly divisible.
Context parallelism. Recommended for auto-sharding.
Tensor parallelism. Used for head parallelism in Wan2.1. Must evenly divide 40 (number of attention heads).
Batch size per device. Can be fractional (e.g., 0.25, 0.125) but must result in whole number when multiplied by device count.
LoRA parameters
Enable LoRA loading for inference.
LoRA configuration. Supports ComfyUI and AI Toolkit formats.Wan2.1 properties:
rank(array): LoRA rank valueslora_model_name_or_path(array): Paths to LoRA modelsweight_name(array): Weight filenamesadapter_name(array): Adapter namesscale(array): Scale factors
rank(array): LoRA rank valueshigh_noise_weight_name(array): High-noise transformer weightslow_noise_weight_name(array): Low-noise transformer weightsscale(array): Scale factors
System parameters
Unique run identifier.
Output directory for videos. Supports GCS paths (gs://).
Random seed for reproducibility.
JAX compilation cache directory.
Enable performance profiling.
Output
Videos are saved as MP4 files with the naming patternwan_output_{seed}_{i}.mp4. Files can be automatically uploaded to GCS if output_dir starts with gs://.
Implementation details
Wan models use:- NNX-based transformer architecture
- Rectified flow for video generation
- 3D VAE for video compression
- Dual guidance system (Wan2.2 only)
- Sequence and head parallelism for efficient training/inference
- Optional LoRA support for customization
Performance notes
- Fractional batch sizes allow fine-grained memory optimization
- Sequence parallelism (ici_fsdp_parallelism) shards the sequence dimension
- Head parallelism (ici_tensor_parallelism) must divide 40 evenly
- Use external disk for model weights (models are large)
- GPU users: cudnn_te_flash attention recommended with ici_fsdp_batch_parallelism
~/workspace/source/src/maxdiffusion/generate_wan.py:1