Quick start
Architecture
SDXL uses an improved architecture:- UNet: Larger model with more cross-attention layers
- Text encoders: Dual encoders (OpenCLIP ViT-G/14 + CLIP ViT-L/14)
- Pooled embeddings: Additional conditioning from text encoder pooled output
- Time conditioning: Resolution and crop coordinates for better control
- VAE: Higher quality decoder for 1024×1024 generation
Configuration
Customize generation parameters:Parameters
| Parameter | Description | Default |
|---|---|---|
prompt | Text description of desired image | Required |
negative_prompt | Concepts to avoid | Empty |
num_inference_steps | Denoising steps | 30 |
guidance_scale | CFG strength | 7.5 |
guidance_rescale | Noise rescale factor | 0.0 |
do_classifier_free_guidance | Enable CFG | True |
resolution | Image height and width | 1024 |
per_device_batch_size | Images per device | 1 |
seed | Random seed | 0 |
SDXL Lightning
SDXL Lightning enables few-step generation (2-8 steps) with minimal quality loss.4-step generation
2-step generation
For ultra-fast generation, use 2-step Lightning with classifier-free guidance disabled:LoRA support
Hyper-SDXL LoRA
Hyper-SDXL enables 2-step generation with LoRA:Multiple LoRA loading
Load multiple LoRA adapters simultaneously:lora_model_name_or_path: HuggingFace repo or local pathweight_name: Safetensors filenameadapter_name: Unique identifier for the adapterscale: LoRA influence strength (0.0-1.0)from_pt: Whether to convert from PyTorch format
Sharding strategies
Data parallelism (default)
SDXL supports single and multi-host inference with sharding annotations. Data parallelism replicates the model:FSDP
Fully shard model parameters to fit larger models:Implementation details
The SDXL pipeline (generate_sdxl.py:src/maxdiffusion/generate_sdxl.py) implements:
Dual text encoding
SDXL uses two text encoders (generate_sdxl.py:93-103):
Additional conditioning
SDXL adds time embeddings for resolution and crop coordinates (generate_sdxl.py:137-150):
Custom models
Load custom SDXL checkpoints from HuggingFace:Output
Generated images are saved asimage_sdxl_{i}.png. The pipeline reports:
- Compile time: Initial compilation duration
- Inference time: Generation time after compilation
Next steps
ControlNet SDXL
Conditional generation with edge detection
SDXL training
Fine-tune SDXL on custom datasets
Flux inference
Next-generation image synthesis
LoRA training
Train custom LoRA adapters