FluxPipeline is a text-to-image generation pipeline using the Flux transformer architecture.
FluxPipeline
Located insrc/maxdiffusion/pipelines/flux/flux_pipeline.py:42
Components
T5 text encoder for processing prompts
CLIP text encoder for pooled embeddings
Variational Auto-Encoder for latent encoding/decoding
Tokenizer for T5 encoder
Tokenizer for CLIP encoder
Flux transformer model for denoising
Euler discrete scheduler
JAX mesh for distributed computation
Methods
encode_prompt
Encodes text prompts using T5 and CLIP encoders.
Parameters:
The prompt or prompts to guide image generation
Optional second prompt (defaults to prompt if not provided)
Number of images to generate per prompt
Maximum sequence length for T5 encoder
T5 text embeddings
CLIP pooled embeddings
Text position IDs
prepare_latents
Prepares initial latent tensors for generation.
Parameters:
Batch size
Number of channels in latent space
Height of generated images
Width of generated images
Data type for latents
Random key for initialization
Packed latent tensors
Position IDs for latents
time_shift
Applies time shifting to timesteps based on sequence length.
Parameters:
Latent tensors
Original timesteps
Shifted timesteps
__call__
Generate images from configuration.
Parameters:
Number of denoising steps
Flux transformer parameters
VAE parameters
Generated images
Key features
- Dual text encoders: Combines T5 and CLIP for rich text understanding
- Rotary position embeddings: Uses RoPE for better positional encoding
- Flow matching: Uses flow matching scheduler for efficient sampling
- Packed latents: Packs latent dimensions for efficient processing
- Time shifting: Dynamically adjusts timesteps based on resolution