Skip to main content

Quickstart

This guide will help you generate your first image with MaxDiffusion using Stable Diffusion XL on a Cloud TPU.

Prerequisites

  • Cloud TPU VM (v4-8, v5p-8, or v6e-8 recommended)
  • MaxDiffusion installed (see Installation)

Generate your first image

1

Activate your environment

Make sure MaxDiffusion is installed and your virtual environment is activated:
source ~/maxdiffusion_venv/bin/activate
cd maxdiffusion
2

Run SDXL inference

Generate an image with Stable Diffusion XL:
python -m src.maxdiffusion.generate_sdxl \
  src/maxdiffusion/configs/base_xl.yml \
  run_name="my_first_image" \
  prompt="A magical castle in the middle of a forest, artistic drawing" \
  output_dir=/tmp/
The generated image will be saved to /tmp/my_first_image/.
3

View your image

The output includes:
  • Generated image: /tmp/my_first_image/image_0.png
  • Generation metadata and parameters
The first run will download model weights from HuggingFace (~13GB for SDXL). Subsequent runs will use cached weights.

Try different models

Fast inference with SD 2.1:
python -m src.maxdiffusion.generate \
  src/maxdiffusion/configs/base21.yml \
  run_name="sd21_test" \
  prompt="A serene mountain landscape at sunset" \
  output_dir=/tmp/

Customize generation parameters

Control the output by modifying these key parameters:

Prompt and guidance

python -m src.maxdiffusion.generate_sdxl \
  src/maxdiffusion/configs/base_xl.yml \
  run_name="custom" \
  prompt="A futuristic city with flying cars" \
  negative_prompt="blurry, low quality, distorted" \
  guidance_scale=9.0 \
  output_dir=/tmp/
  • prompt: Main text description of the image
  • negative_prompt: What to avoid in the image
  • guidance_scale: How closely to follow the prompt (7.0-15.0, higher = more adherence)

Quality settings

python -m src.maxdiffusion.generate_sdxl \
  src/maxdiffusion/configs/base_xl.yml \
  run_name="high_quality" \
  prompt="Portrait of a wise old wizard" \
  num_inference_steps=50 \
  resolution=1024 \
  output_dir=/tmp/
  • num_inference_steps: More steps = higher quality but slower (20-50)
  • resolution: Output image size (512, 1024 for SDXL)

Performance settings

python -m src.maxdiffusion.generate_sdxl \
  src/maxdiffusion/configs/base_xl.yml \
  run_name="batch" \
  prompt="A scenic landscape" \
  per_device_batch_size=4 \
  attention=flash \
  output_dir=/tmp/
  • per_device_batch_size: Generate multiple images in parallel
  • attention: Use flash for faster inference on TPU

Performance benchmarks

Expected generation times on different hardware:
ModelHardwareStepsBatch SizeTime
Flux Schnellv6e-4440.8s
Flux Devv6e-42845.5s
Flux Schnellv4-8442.2s
Flux Devv4-828423s
SDXLv5p-8202~15s

Use with LoRA adapters

Load LoRA adapters for style transfer:
python src/maxdiffusion/generate_flux.py \
  src/maxdiffusion/configs/base_flux_dev.yml \
  jax_cache_dir=/tmp/cache_dir \
  run_name=flux_lora \
  output_dir=/tmp/ \
  prompt="A cute corgi in a house made of sushi, anime" \
  num_inference_steps=28 \
  ici_data_parallelism=1 \
  ici_fsdp_parallelism=-1 \
  split_head_dim=True \
  lora_config='{"lora_model_name_or_path" : ["/path/to/anime_lora.safetensors"], "weight_name" : ["anime_lora.safetensors"], "adapter_name" : ["anime"], "scale": [0.8], "from_pt": ["true"]}'
Learn more about LoRA adapters in the LoRA guide.

Common issues

Reduce batch size or resolution:
per_device_batch_size=1
resolution=512
Or enable gradient checkpointing and offloading in the config file.
Set up HuggingFace authentication:
pip install huggingface_hub
huggingface-cli login
Some models (like Flux) require accepting license terms on HuggingFace.
The first run includes:
  • Model weight download (~5-15GB)
  • JAX compilation (1-3 minutes)
Subsequent runs are much faster as weights and compiled code are cached.

Next steps

Training guide

Fine-tune models on your own data

LoRA adapters

Use LoRA for style transfer

Video generation

Generate videos with Wan models

Scale to multi-host

Deploy at scale with XPK

Build docs developers (and LLMs) love