FlaxStableDiffusionPipeline is a Flax-based pipeline for text-to-image generation using Stable Diffusion.
FlaxStableDiffusionPipeline
Located insrc/maxdiffusion/pipelines/stable_diffusion/pipeline_flax_stable_diffusion.py:77
Components
Variational Auto-Encoder (VAE) model to encode and decode images to and from latent representations
Frozen text-encoder (clip-vit-large-patch14)
A CLIPTokenizer to tokenize text
A FlaxUNet2DConditionModel to denoise the encoded image latents
A scheduler to be used in combination with unet to denoise the encoded image latents. Can be one of FlaxDDIMScheduler, FlaxLMSDiscreteScheduler, FlaxPNDMScheduler, or FlaxDPMSolverMultistepScheduler
Methods
prepare_inputs
Tokenizes text prompts.
Parameters:
The prompt or prompts to guide image generation
Tokenized input IDs
__call__
Generate images from text prompts.
Parameters:
Tokenized prompt IDs
Model parameters for all pipeline components
Random seed for generation
The number of denoising steps. More steps usually lead to higher quality images at the expense of slower inference
The height in pixels of the generated image. Defaults to
unet.config.sample_size * vae_scale_factorThe width in pixels of the generated image. Defaults to
unet.config.sample_size * vae_scale_factorA higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. Guidance scale is enabled when
guidance_scale > 1Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for image generation
Tokenized negative prompt IDs for classifier-free guidance
Whether to run pmap versions of the generation functions
Generated images as numpy arrays
Whether NSFW content was detected (always False in current implementation)