FlaxStableDiffusionXLPipeline extends Stable Diffusion with higher resolution generation and dual text encoders.
FlaxStableDiffusionXLPipeline
Located insrc/maxdiffusion/pipelines/stable_diffusion_xl/pipeline_flax_stable_diffusion_xl.py:43
Components
First CLIP text encoder
Second CLIP text encoder for improved text understanding
Variational Auto-Encoder (VAE) model to encode and decode images
First tokenizer
Second tokenizer corresponding to text_encoder_2
UNet to denoise the encoded image latents with additional conditioning
A scheduler to be used in combination with unet to denoise the encoded image latents
Methods
prepare_inputs
Tokenizes text prompts with both tokenizers.
Parameters:
The prompt or prompts to guide image generation
Stacked tokenized input IDs from both tokenizers with shape
(batch_size, 2, sequence_length)get_embeddings
Encodes prompts using dual text encoders.
Parameters:
Tokenized prompt IDs from both tokenizers
Model parameters
Concatenated embeddings from both text encoders
Pooled text embeddings from the second encoder
__call__
Generate images from text prompts.
Parameters:
Tokenized prompt IDs from both encoders
Model parameters for all pipeline components
Random seed for generation
The number of denoising steps
Guidance scale for classifier-free guidance
The height in pixels of the generated image. Defaults to
unet.config.sample_size * vae_scale_factorThe width in pixels of the generated image. Defaults to
unet.config.sample_size * vae_scale_factorPre-generated noisy latents
Tokenized negative prompt IDs
Output type - set to “latent” to return latents instead of decoded images
Whether to run pmap versions of the generation functions
Generated images or latents
Key differences from Stable Diffusion
- Dual text encoders: Uses two CLIP text encoders for improved text understanding
- Higher resolution: Optimized for generating 1024x1024 images
- Additional conditioning: Uses pooled text embeddings and time IDs for micro-conditioning
- Concatenated embeddings: Text embeddings from both encoders are concatenated