Skip to main content

Overview

sd-scripts ships two layers of inference tooling:
  • gen_img.py — a full-featured CLI for SD 1.x, SD 2.x, and SDXL. It supports interactive prompting, file-driven batch generation, img2img, inpainting, ControlNet, LoRA, Highres fix, and more.
  • Minimal inference scripts — lightweight, model-specific scripts for FLUX, SD3, Lumina, and SDXL that load the model in its native format without requiring a Diffusers pipeline.

gen_img.py

Modes of operation

gen_img.py supports three modes depending on how you supply prompts.
Type prompts one at a time. The script generates an image after each entry and displays a preview window.
python gen_img.py \
  --ckpt model.safetensors \
  --outdir outputs \
  --xformers --fp16 \
  --interactive
When Type prompt: appears, enter your prompt and press Enter. Select the preview window and press any key to close it and enter the next prompt. Press Ctrl+Z then Enter to exit.
If the preview window fails with an OpenCV error, install a display-capable build with pip install opencv-python, or suppress the window entirely with --no_preview.

Required and common options

Model specification

OptionDescription
--ckpt <path>Path to a .safetensors / .ckpt checkpoint, a Diffusers model folder, or a Hugging Face model ID. Required.
--v1SD 1.x model (default).
--v2SD 2.x model.
--sdxlStable Diffusion XL model.
--v_parameterizationAdd alongside --v2 for v-parameterization models such as 768-v-ema.ckpt.
--vae <path>Load an external VAE instead of the one embedded in the checkpoint.

Output

OptionDefaultDescription
--outdir <path>Folder where images are saved. Required.
--W <width>512Image width in pixels.
--H <height>512Image height in pixels.
--steps <n>50Denoising steps.
--scale <f>7.5Classifier-free guidance scale.
--sampler <name>ddimSampler to use (see list below).
--images_per_prompt <n>1Images generated per prompt.
--batch_size <n>1GPU batch size.
--seed <n>randomGlobal seed for reproducibility.
Available samplers: ddim, pndm, lms, euler, euler_a, heun, dpm_2, dpm_2_a, dpmsolver, dpmsolver++, dpmsingle, k_lms, k_euler, k_euler_a, k_dpm_2, k_dpm_2_a

Precision and memory

OptionDescription
--fp16Inference in float16. Recommended for most GPUs.
--bf16Inference in bfloat16. RTX 30-series and newer; less prone to NaN outputs.
--xformersEnable xformers memory-efficient attention.
--sdpaUse PyTorch 2 scaled dot-product attention instead of xformers.
--vae_batch_size <n>Separate batch size for VAE decode (reduce if you get OOM after denoising).
--vae_slices <n>Slice images during VAE decode (e.g. 16 or 32) to lower peak VRAM.
--no_half_vaeForce fp32 VAE decode to avoid VAE artifacts.

Prompt syntax

Write negative prompts inline using --n:
beautiful flowers --n monochrome, blurry
Weight tokens with parentheses (A1111-compatible syntax):
(masterpiece:1.4), 1girl, (detailed eyes:1.2) --n (worst quality:1.3)
Extend beyond 75 tokens with --max_embeddings_multiples. For example, --max_embeddings_multiples 3 allows prompts up to 225 tokens.

Dynamic prompts (wildcards)

SyntaxBehavior
{A|B|C}Randomly pick one of A, B, or C.
{e$$A|B|C}Generate one image for every option in order.
{2$$A|B|C}Randomly pick 2 items and combine them.
{2$$ and $$A|B|C}Same, joined with ” and ” instead of ”, ”.

Prompt options

You can override per-image settings directly inside the prompt text using --x flags. These work in interactive mode, command-line prompts, and prompt files.
FlagDescription
--n <text>Negative prompt.
--w <n>Image width for this prompt.
--h <n>Image height for this prompt.
--s <n>Step count for this prompt.
--d <seed>Seed (comma-separated list for multiple images).
--l <f>Guidance scale for this prompt.
--t <f>img2img strength for this prompt.
--am <f>[,<f>…]LoRA multipliers (one per network, comma-separated).
--f <name>Output file name for this image.
Example from a prompt file:
(masterpiece), 1girl, cherry blossoms --n lowres, bad anatomy --w 960 --h 640 --s 28 --d 1

LoRA support

Apply one or more LoRA networks at inference time. The number of --network_module, --network_weights, and --network_mul entries must match.
python gen_img.py \
  --ckpt model.safetensors \
  --outdir outputs --xformers --fp16 \
  --W 512 --H 768 --sampler k_euler_a --steps 48 \
  --network_module networks.lora networks.lora \
  --network_weights style_lora.safetensors char_lora.safetensors \
  --network_mul 0.8 0.5 \
  --interactive
OptionDescription
--network_moduleNetwork type. Use networks.lora for LoRA. Repeat for multiple networks.
--network_weights <path>Weight file(s). One per --network_module.
--network_mul <f>Multiplier(s) for each network. Default is 1.0.
--network_mergePre-merge LoRA weights into the base model before generation (faster, but disables per-prompt --am overrides).
--network_pre_calcPre-compute LoRA offsets for each generation step (fast with --am support).

img2img

Pass an image with --image_path and a denoise strength with --strength.
python gen_img.py \
  --ckpt model.safetensors \
  --outdir outputs --xformers --fp16 \
  --scale 12.5 --sampler k_euler --steps 32 \
  --image_path template.png --strength 0.8 \
  --batch_size 8 --images_per_prompt 32 \
  --prompt "1girl, sailor school uniform, outdoors --n lowres, bad anatomy"
If you point --image_path at a folder, images are processed in filename order. Pad filenames with leading zeros (001.jpg, 002.jpg) to ensure the correct sort order.

Inpainting

Supply a grayscale mask image with --mask_image. White regions are repainted; black regions are preserved. Gradients in the mask produce soft transitions.
python gen_img.py --ckpt model.safetensors --outdir outputs \
  --image_path source.png --mask_image mask.png --strength 0.9 \
  --prompt "new background"

Highres fix

Generate a small image first, then upscale and refine it with img2img to produce a high-resolution output without the compositional problems that come from denoising at large latent sizes.
python gen_img.py \
  --ckpt model.safetensors \
  --outdir outputs --xformers --fp16 \
  --W 1024 --H 1024 --steps 48 --sampler ddim \
  --highres_fix_scale 0.5 --highres_fix_steps 28 --strength 0.5 \
  --interactive
--highres_fix_scale 0.5 means the first stage generates at 512×512 (half of 1024×1024). The second stage upscales and refines to the full size.
OptionDescription
--highres_fix_scale <f>Ratio of first-stage size to final size. 0.5 → generate at 50% resolution first.
--highres_fix_steps <n>Steps for the first stage. Default is 28.
--highres_fix_save_1stAlso save the first-stage image.
--highres_fix_latents_upscalingUpscale in latent space (bilinear) instead of pixel space (LANCZOS4).

ControlNet

ControlNet v1.0 and v1.1 models are supported. Only Canny preprocessing is built in; pre-process other condition types (depth, pose, etc.) before passing them as guide images.
python gen_img.py \
  --ckpt model.safetensors --outdir outputs \
  --xformers --bf16 --W 512 --H 768 --sampler k_euler_a --steps 48 \
  --control_net_models diff_control_sd15_canny.safetensors \
  --control_net_weights 1.0 \
  --control_net_ratios 1.0 \
  --guide_image_path guide.png \
  --interactive
OptionDescription
--control_net_models <path>ControlNet model file(s).
--guide_image_path <path>Hint image or folder of hint images.
--control_net_preps <name>Preprocessing per model. Use none to skip, or canny_63_191 for Canny with thresholds 63 and 191.
--control_net_weights <f>Influence weight per model (e.g. 1.0).
--control_net_ratios <f>Fraction of steps to apply ControlNet (e.g. 0.5 → first half only).

Build docs developers (and LLMs) love