Image Generation

Overview

sd-scripts ships two layers of inference tooling:

gen_img.py — a full-featured CLI for SD 1.x, SD 2.x, and SDXL. It supports interactive prompting, file-driven batch generation, img2img, inpainting, ControlNet, LoRA, Highres fix, and more.
Minimal inference scripts — lightweight, model-specific scripts for FLUX, SD3, Lumina, and SDXL that load the model in its native format without requiring a Diffusers pipeline.

SD 1.x / 2.x / SDXL
FLUX
SD3
Lumina

gen_img.py

Modes of operation

gen_img.py supports three modes depending on how you supply prompts.

Interactive
Anima

Type prompts one at a time. The script generates an image after each entry and displays a preview window.

python gen_img.py \
  --ckpt model.safetensors \
  --outdir outputs \
  --xformers --fp16 \
  --interactive

When Type prompt: appears, enter your prompt and press Enter. Select the preview window and press any key to close it and enter the next prompt. Press Ctrl+Z then Enter to exit.

If the preview window fails with an OpenCV error, install a display-capable build with pip install opencv-python, or suppress the window entirely with --no_preview.

Anima inference

anima_minimal_inference.py generates images with Anima DiT models using a Qwen3 text encoder.

python anima_minimal_inference.py \
  --dit /path/to/anima-dit.safetensors \
  --vae /path/to/qwen-image-vae.safetensors \
  --text_encoder /path/to/Qwen3-0.6B \
  --prompt "A beautiful landscape" \
  --save_path outputs/image.png \
  --image_size 1024 1024 \
  --infer_steps 50 \
  --seed 42

To apply a trained LoRA during inference:

python anima_minimal_inference.py \
  --dit /path/to/anima-dit.safetensors \
  --vae /path/to/qwen-image-vae.safetensors \
  --text_encoder /path/to/Qwen3-0.6B \
  --lora_weight /path/to/my_anima_lora.safetensors \
  --lora_multiplier 1.0 \
  --prompt "my_subject in a park" \
  --save_path outputs/lora_image.png \
  --image_size 1024 1024

Key arguments

Argument	Default	Description
`--dit`	—	Anima DiT model path.
`--vae`	—	Qwen-Image VAE path.
`--text_encoder`	—	Qwen3-0.6B text encoder path. Required.
`--prompt`	—	Generation prompt.
`--negative_prompt`	`""`	Negative prompt.
`--image_size`	1024 1024	Output height and width (two integers).
`--infer_steps`	`50`	Denoising steps.
`--save_path`	—	Output file path. Required.
`--seed`	random	Reproducibility seed.
`--fp8` / `--fp8_scaled`	—	Use FP8 for the DiT to reduce VRAM.
`--lora_weight`	—	Path(s) to LoRA safetensors files.
`--lora_multiplier`	`1.0`	LoRA weight multiplier(s).
`--interactive`	false	Interactive prompt mode.

Required and common options

Model specification

Option	Description
`--ckpt <path>`	Path to a `.safetensors` / `.ckpt` checkpoint, a Diffusers model folder, or a Hugging Face model ID. Required.
`--v1`	SD 1.x model (default).
`--v2`	SD 2.x model.
`--sdxl`	Stable Diffusion XL model.
`--v_parameterization`	Add alongside `--v2` for v-parameterization models such as `768-v-ema.ckpt`.
`--vae <path>`	Load an external VAE instead of the one embedded in the checkpoint.

Output

Option	Default	Description
`--outdir <path>`	—	Folder where images are saved. Required.
`--W <width>`	512	Image width in pixels.
`--H <height>`	512	Image height in pixels.
`--steps <n>`	50	Denoising steps.
`--scale <f>`	7.5	Classifier-free guidance scale.
`--sampler <name>`	`ddim`	Sampler to use (see list below).
`--images_per_prompt <n>`	1	Images generated per prompt.
`--batch_size <n>`	1	GPU batch size.
`--seed <n>`	random	Global seed for reproducibility.

Available samplers: ddim, pndm, lms, euler, euler_a, heun, dpm_2, dpm_2_a, dpmsolver, dpmsolver++, dpmsingle, k_lms, k_euler, k_euler_a, k_dpm_2, k_dpm_2_a

Precision and memory

Option	Description
`--fp16`	Inference in float16. Recommended for most GPUs.
`--bf16`	Inference in bfloat16. RTX 30-series and newer; less prone to NaN outputs.
`--xformers`	Enable xformers memory-efficient attention.
`--sdpa`	Use PyTorch 2 scaled dot-product attention instead of xformers.
`--vae_batch_size <n>`	Separate batch size for VAE decode (reduce if you get OOM after denoising).
`--vae_slices <n>`	Slice images during VAE decode (e.g. `16` or `32`) to lower peak VRAM.
`--no_half_vae`	Force fp32 VAE decode to avoid VAE artifacts.

Prompt syntax

Write negative prompts inline using --n:

beautiful flowers --n monochrome, blurry

Weight tokens with parentheses (A1111-compatible syntax):

(masterpiece:1.4), 1girl, (detailed eyes:1.2) --n (worst quality:1.3)

Extend beyond 75 tokens with --max_embeddings_multiples. For example, --max_embeddings_multiples 3 allows prompts up to 225 tokens.

Dynamic prompts (wildcards)

Syntax	Behavior
`{A\|B\|C}`	Randomly pick one of A, B, or C.
`{e$$A\|B\|C}`	Generate one image for every option in order.
`{2$$A\|B\|C}`	Randomly pick 2 items and combine them.
`{2$$ and $$A\|B\|C}`	Same, joined with ” and ” instead of ”, ”.

Prompt options

You can override per-image settings directly inside the prompt text using --x flags. These work in interactive mode, command-line prompts, and prompt files.

Flag	Description
`--n <text>`	Negative prompt.
`--w <n>`	Image width for this prompt.
`--h <n>`	Image height for this prompt.
`--s <n>`	Step count for this prompt.
`--d <seed>`	Seed (comma-separated list for multiple images).
`--l <f>`	Guidance scale for this prompt.
`--t <f>`	img2img strength for this prompt.
`--am <f>[,<f>…]`	LoRA multipliers (one per network, comma-separated).
`--f <name>`	Output file name for this image.

Example from a prompt file:

(masterpiece), 1girl, cherry blossoms --n lowres, bad anatomy --w 960 --h 640 --s 28 --d 1

LoRA support

Apply one or more LoRA networks at inference time. The number of --network_module, --network_weights, and --network_mul entries must match.

python gen_img.py \
  --ckpt model.safetensors \
  --outdir outputs --xformers --fp16 \
  --W 512 --H 768 --sampler k_euler_a --steps 48 \
  --network_module networks.lora networks.lora \
  --network_weights style_lora.safetensors char_lora.safetensors \
  --network_mul 0.8 0.5 \
  --interactive

Option	Description
`--network_module`	Network type. Use `networks.lora` for LoRA. Repeat for multiple networks.
`--network_weights <path>`	Weight file(s). One per `--network_module`.
`--network_mul <f>`	Multiplier(s) for each network. Default is `1.0`.
`--network_merge`	Pre-merge LoRA weights into the base model before generation (faster, but disables per-prompt `--am` overrides).
`--network_pre_calc`	Pre-compute LoRA offsets for each generation step (fast with `--am` support).

img2img

Pass an image with --image_path and a denoise strength with --strength.

python gen_img.py \
  --ckpt model.safetensors \
  --outdir outputs --xformers --fp16 \
  --scale 12.5 --sampler k_euler --steps 32 \
  --image_path template.png --strength 0.8 \
  --batch_size 8 --images_per_prompt 32 \
  --prompt "1girl, sailor school uniform, outdoors --n lowres, bad anatomy"

If you point --image_path at a folder, images are processed in filename order. Pad filenames with leading zeros (001.jpg, 002.jpg) to ensure the correct sort order.

Inpainting

Supply a grayscale mask image with --mask_image. White regions are repainted; black regions are preserved. Gradients in the mask produce soft transitions.

python gen_img.py --ckpt model.safetensors --outdir outputs \
  --image_path source.png --mask_image mask.png --strength 0.9 \
  --prompt "new background"

Highres fix

Generate a small image first, then upscale and refine it with img2img to produce a high-resolution output without the compositional problems that come from denoising at large latent sizes.

python gen_img.py \
  --ckpt model.safetensors \
  --outdir outputs --xformers --fp16 \
  --W 1024 --H 1024 --steps 48 --sampler ddim \
  --highres_fix_scale 0.5 --highres_fix_steps 28 --strength 0.5 \
  --interactive

--highres_fix_scale 0.5 means the first stage generates at 512×512 (half of 1024×1024). The second stage upscales and refines to the full size.

Option	Description
`--highres_fix_scale <f>`	Ratio of first-stage size to final size. `0.5` → generate at 50% resolution first.
`--highres_fix_steps <n>`	Steps for the first stage. Default is `28`.
`--highres_fix_save_1st`	Also save the first-stage image.
`--highres_fix_latents_upscaling`	Upscale in latent space (bilinear) instead of pixel space (LANCZOS4).

ControlNet

ControlNet v1.0 and v1.1 models are supported. Only Canny preprocessing is built in; pre-process other condition types (depth, pose, etc.) before passing them as guide images.

python gen_img.py \
  --ckpt model.safetensors --outdir outputs \
  --xformers --bf16 --W 512 --H 768 --sampler k_euler_a --steps 48 \
  --control_net_models diff_control_sd15_canny.safetensors \
  --control_net_weights 1.0 \
  --control_net_ratios 1.0 \
  --guide_image_path guide.png \
  --interactive

Option	Description
`--control_net_models <path>`	ControlNet model file(s).
`--guide_image_path <path>`	Hint image or folder of hint images.
`--control_net_preps <name>`	Preprocessing per model. Use `none` to skip, or `canny_63_191` for Canny with thresholds 63 and 191.
`--control_net_weights <f>`	Influence weight per model (e.g. `1.0`).
`--control_net_ratios <f>`	Fraction of steps to apply ControlNet (e.g. `0.5` → first half only).

FLUX inference

flux_minimal_inference.py loads FLUX.1 dev or schnell models directly from safetensors files without the Diffusers pipeline, which lets you work with models before they are converted.

python flux_minimal_inference.py \
  --ckpt_path flux1-dev.safetensors \
  --clip_l clip_l.safetensors \
  --t5xxl t5xxl_fp16.safetensors \
  --ae ae.safetensors \
  --prompt "A photo of a serene mountain lake at dawn" \
  --output_dir outputs \
  --dtype bfloat16 \
  --steps 20 \
  --guidance 3.5 \
  --width 1024 --height 1024 \
  --seed 42

Key arguments

Argument	Default	Description
`--ckpt_path`	—	FLUX model safetensors file. Required.
`--clip_l`	—	CLIP-L text encoder safetensors.
`--t5xxl`	—	T5-XXL text encoder safetensors.
`--ae`	—	Autoencoder (VAE) safetensors.
`--model_type`	`flux`	`flux` or `chroma`.
`--prompt`	—	Generation prompt.
`--output_dir`	`.`	Directory to save images.
`--dtype`	`bfloat16`	Base model dtype.
`--steps`	4 (schnell) / 50 (dev)	Denoising steps.
`--guidance`	`3.5`	Guidance scale.
`--cfg_scale`	`1.0`	CFG scale (set above `1.0` to enable CFG).
`--negative_prompt`	—	Negative prompt (used when CFG is active).
`--width` / `--height`	1024	Output dimensions.
`--seed`	random	Reproducibility seed.
`--offload`	false	Offload model components to CPU to reduce VRAM.
`--interactive`	false	Run in interactive prompt mode.

LoRA at inference

Pass one or more LoRA weights with --lora_weights and their multipliers with --lora_multipliers.

python flux_minimal_inference.py \
  --ckpt_path flux1-dev.safetensors \
  --clip_l clip_l.safetensors \
  --t5xxl t5xxl_fp16.safetensors \
  --ae ae.safetensors \
  --lora_weights my_lora.safetensors \
  --lora_multipliers 0.8 \
  --prompt "a photo of sks person" \
  --output_dir outputs

SD3 inference

sd3_minimal_inference.py generates images with Stable Diffusion 3 and 3.5 models loaded directly from safetensors.

python sd3_minimal_inference.py \
  --ckpt_path sd3.5_large.safetensors \
  --clip_g clip_g.safetensors \
  --clip_l clip_l.safetensors \
  --t5xxl t5xxl_fp16.safetensors \
  --prompt "A majestic dragon soaring over a medieval city" \
  --negative_prompt "low quality, blurry" \
  --output_dir outputs \
  --steps 28 \
  --cfg_scale 5.0 \
  --width 1024 --height 1024 \
  --seed 1

Key arguments

Argument	Default	Description
`--ckpt_path`	—	SD3 model safetensors file. Required.
`--clip_g`	—	CLIP-G text encoder.
`--clip_l`	—	CLIP-L text encoder.
`--t5xxl`	—	T5-XXL text encoder.
`--prompt`	—	Generation prompt.
`--negative_prompt`	`""`	Negative prompt.
`--cfg_scale`	`5.0`	Classifier-free guidance scale.
`--steps`	`50`	Denoising steps.
`--width` / `--height`	1024	Output dimensions.
`--seed`	`1`	Reproducibility seed.
`--fp16` / `--bf16`	—	Precision flag.
`--offload`	false	Offload to CPU to save VRAM.
`--interactive`	false	Interactive prompt mode.
`--t5xxl_token_length`	`256`	Maximum T5-XXL token length.

Lumina inference

lumina_minimal_inference.py generates images with Lumina-Next models, which use a Gemma2 text encoder alongside a diffusion transformer.

python lumina_minimal_inference.py \
  --gemma2 gemma2_weights/ \
  --ae ae.safetensors \
  --ckpt_path lumina_next.safetensors \
  --prompt "A beautiful sunset over the mountains" \
  --output_dir outputs \
  --steps 36 \
  --guidance_scale 3.5 \
  --image_width 1024 --image_height 1024 \
  --dtype bf16

Key arguments

Argument	Default	Description
`--ckpt_path`	—	Lumina model safetensors file. Required.
`--ae`	—	Autoencoder safetensors.
`--gemma2`	—	Gemma2 text encoder path or ID.
`--prompt`	—	Generation prompt.
`--negative_prompt`	`""`	Negative prompt.
`--output_dir`	`outputs`	Output directory.
`--steps`	`36`	Denoising steps.
`--guidance_scale`	`3.5`	CFG guidance scale.
`--image_width` / `--image_height`	1024	Output dimensions.
`--dtype`	`bf16`	Model dtype (`bf16`, `fp16`, `float`).
`--seed`	random	Reproducibility seed.
`--offload`	false	Offload to CPU to reduce VRAM.
`--system_prompt`	`""`	System prompt for Gemma2.

Getting Started

Dataset Preparation

LoRA Training

Fine-tuning & Other Methods

Inference & Utilities

Overview

gen_img.py

Modes of operation

Anima inference

Key arguments

Required and common options

Model specification

Output

Precision and memory

Prompt syntax

Dynamic prompts (wildcards)

Prompt options

LoRA support

img2img

Inpainting

Highres fix

ControlNet

FLUX inference

Key arguments

LoRA at inference

SD3 inference

Key arguments

Lumina inference

Key arguments

Build docs developers (and LLMs) love

Getting Started

Dataset Preparation

LoRA Training

Fine-tuning & Other Methods

Inference & Utilities

​Overview

​gen_img.py

​Modes of operation

​Anima inference

​Key arguments

​Required and common options

​Model specification

​Output

​Precision and memory

​Prompt syntax

​Dynamic prompts (wildcards)

​Prompt options

​LoRA support

​img2img

​Inpainting

​Highres fix

​ControlNet

​FLUX inference

​Key arguments

​LoRA at inference

​SD3 inference

​Key arguments

​Lumina inference

​Key arguments

Build docs developers (and LLMs) love

Overview

gen_img.py

Modes of operation

Anima inference

Key arguments

Required and common options

Model specification

Output

Precision and memory

Prompt syntax

Dynamic prompts (wildcards)

Prompt options

LoRA support

img2img

Inpainting

Highres fix

ControlNet

FLUX inference

Key arguments

LoRA at inference

SD3 inference

Key arguments

Lumina inference

Key arguments