Quickstart

This guide will help you generate your first image with MaxDiffusion using Stable Diffusion XL on a Cloud TPU.

Prerequisites

Cloud TPU VM (v4-8, v5p-8, or v6e-8 recommended)
MaxDiffusion installed (see Installation)

Generate your first image

Activate your environment

Make sure MaxDiffusion is installed and your virtual environment is activated:

source ~/maxdiffusion_venv/bin/activate
cd maxdiffusion

Run SDXL inference

Generate an image with Stable Diffusion XL:

python -m src.maxdiffusion.generate_sdxl \
  src/maxdiffusion/configs/base_xl.yml \
  run_name="my_first_image" \
  prompt="A magical castle in the middle of a forest, artistic drawing" \
  output_dir=/tmp/

The generated image will be saved to /tmp/my_first_image/.

View your image

The output includes:

Generated image: /tmp/my_first_image/image_0.png
Generation metadata and parameters

The first run will download model weights from HuggingFace (~13GB for SDXL). Subsequent runs will use cached weights.

Try different models

Stable Diffusion 2.1
Flux Schnell
Flux Dev

Fast inference with SD 2.1:

python -m src.maxdiffusion.generate \
  src/maxdiffusion/configs/base21.yml \
  run_name="sd21_test" \
  prompt="A serene mountain landscape at sunset" \
  output_dir=/tmp/

Ultra-fast 4-step generation with Flux Schnell:

python src/maxdiffusion/generate_flux.py \
  src/maxdiffusion/configs/base_flux_schnell.yml \
  jax_cache_dir=/tmp/cache_dir \
  run_name=flux_test \
  output_dir=/tmp/ \
  prompt="A cute corgi in a house made of sushi, anime" \
  per_device_batch_size=1

Flux requires accepting the license on HuggingFace. Visit black-forest-labs/FLUX.1-schnell to accept the terms.

High-quality generation with Flux Dev (28 steps):

python src/maxdiffusion/generate_flux.py \
  src/maxdiffusion/configs/base_flux_dev.yml \
  jax_cache_dir=/tmp/cache_dir \
  run_name=flux_dev_test \
  output_dir=/tmp/ \
  prompt="A photograph of an electronics chip in the shape of a race car" \
  per_device_batch_size=1

Customize generation parameters

Control the output by modifying these key parameters:

Prompt and guidance

python -m src.maxdiffusion.generate_sdxl \
  src/maxdiffusion/configs/base_xl.yml \
  run_name="custom" \
  prompt="A futuristic city with flying cars" \
  negative_prompt="blurry, low quality, distorted" \
  guidance_scale=9.0 \
  output_dir=/tmp/

prompt: Main text description of the image
negative_prompt: What to avoid in the image
guidance_scale: How closely to follow the prompt (7.0-15.0, higher = more adherence)

Quality settings

python -m src.maxdiffusion.generate_sdxl \
  src/maxdiffusion/configs/base_xl.yml \
  run_name="high_quality" \
  prompt="Portrait of a wise old wizard" \
  num_inference_steps=50 \
  resolution=1024 \
  output_dir=/tmp/

num_inference_steps: More steps = higher quality but slower (20-50)
resolution: Output image size (512, 1024 for SDXL)

Performance settings

python -m src.maxdiffusion.generate_sdxl \
  src/maxdiffusion/configs/base_xl.yml \
  run_name="batch" \
  prompt="A scenic landscape" \
  per_device_batch_size=4 \
  attention=flash \
  output_dir=/tmp/

per_device_batch_size: Generate multiple images in parallel
attention: Use flash for faster inference on TPU

Performance benchmarks

Expected generation times on different hardware:

Model	Hardware	Steps	Batch Size	Time
Flux Schnell	v6e-4	4	4	0.8s
Flux Dev	v6e-4	28	4	5.5s
Flux Schnell	v4-8	4	4	2.2s
Flux Dev	v4-8	28	4	23s
SDXL	v5p-8	20	2	~15s

Use with LoRA adapters

Load LoRA adapters for style transfer:

python src/maxdiffusion/generate_flux.py \
  src/maxdiffusion/configs/base_flux_dev.yml \
  jax_cache_dir=/tmp/cache_dir \
  run_name=flux_lora \
  output_dir=/tmp/ \
  prompt="A cute corgi in a house made of sushi, anime" \
  num_inference_steps=28 \
  ici_data_parallelism=1 \
  ici_fsdp_parallelism=-1 \
  split_head_dim=True \
  lora_config='{"lora_model_name_or_path" : ["/path/to/anime_lora.safetensors"], "weight_name" : ["anime_lora.safetensors"], "adapter_name" : ["anime"], "scale": [0.8], "from_pt": ["true"]}'

Learn more about LoRA adapters in the LoRA guide.

Common issues

Out of memory

Reduce batch size or resolution:

per_device_batch_size=1
resolution=512

Or enable gradient checkpointing and offloading in the config file.

Model download fails

Set up HuggingFace authentication:

pip install huggingface_hub
huggingface-cli login

Some models (like Flux) require accepting license terms on HuggingFace.

Slow first run

The first run includes:

Model weight download (~5-15GB)
JAX compilation (1-3 minutes)

Subsequent runs are much faster as weights and compiled code are cached.

Next steps

Training guide

Fine-tune models on your own data

LoRA adapters

Use LoRA for style transfer

Video generation

Generate videos with Wan models

Scale to multi-host

Deploy at scale with XPK

Getting Started

Core Concepts

Training

Inference

Advanced Features

Deployment

Guides

Quickstart

Quickstart

Prerequisites

Generate your first image

Try different models

Customize generation parameters

Prompt and guidance

Quality settings

Performance settings

Performance benchmarks

Use with LoRA adapters

Common issues

Next steps

Training guide

LoRA adapters

Video generation

Scale to multi-host

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Training

Inference

Advanced Features

Deployment

Guides

​Quickstart

​Prerequisites

​Generate your first image

​Try different models

​Customize generation parameters

​Prompt and guidance

​Quality settings

​Performance settings

​Performance benchmarks

​Use with LoRA adapters

​Common issues

​Next steps

Training guide

LoRA adapters

Video generation

Scale to multi-host

Build docs developers (and LLMs) love

Quickstart

Prerequisites

Generate your first image

Try different models

Customize generation parameters

Prompt and guidance

Quality settings

Performance settings

Performance benchmarks

Use with LoRA adapters

Common issues

Next steps