Overview
sd-scripts ships two layers of inference tooling:gen_img.py— a full-featured CLI for SD 1.x, SD 2.x, and SDXL. It supports interactive prompting, file-driven batch generation, img2img, inpainting, ControlNet, LoRA, Highres fix, and more.- Minimal inference scripts — lightweight, model-specific scripts for FLUX, SD3, Lumina, and SDXL that load the model in its native format without requiring a Diffusers pipeline.
- SD 1.x / 2.x / SDXL
- FLUX
- SD3
- Lumina
gen_img.py
Modes of operation
gen_img.py supports three modes depending on how you supply prompts.- Interactive
- Anima
Type prompts one at a time. The script generates an image after each entry and displays a preview window.When
Type prompt: appears, enter your prompt and press Enter. Select the preview window and press any key to close it and enter the next prompt. Press Ctrl+Z then Enter to exit.If the preview window fails with an OpenCV error, install a display-capable build with
pip install opencv-python, or suppress the window entirely with --no_preview.Required and common options
Model specification
| Option | Description |
|---|---|
--ckpt <path> | Path to a .safetensors / .ckpt checkpoint, a Diffusers model folder, or a Hugging Face model ID. Required. |
--v1 | SD 1.x model (default). |
--v2 | SD 2.x model. |
--sdxl | Stable Diffusion XL model. |
--v_parameterization | Add alongside --v2 for v-parameterization models such as 768-v-ema.ckpt. |
--vae <path> | Load an external VAE instead of the one embedded in the checkpoint. |
Output
| Option | Default | Description |
|---|---|---|
--outdir <path> | — | Folder where images are saved. Required. |
--W <width> | 512 | Image width in pixels. |
--H <height> | 512 | Image height in pixels. |
--steps <n> | 50 | Denoising steps. |
--scale <f> | 7.5 | Classifier-free guidance scale. |
--sampler <name> | ddim | Sampler to use (see list below). |
--images_per_prompt <n> | 1 | Images generated per prompt. |
--batch_size <n> | 1 | GPU batch size. |
--seed <n> | random | Global seed for reproducibility. |
ddim, pndm, lms, euler, euler_a, heun, dpm_2, dpm_2_a, dpmsolver, dpmsolver++, dpmsingle, k_lms, k_euler, k_euler_a, k_dpm_2, k_dpm_2_aPrecision and memory
| Option | Description |
|---|---|
--fp16 | Inference in float16. Recommended for most GPUs. |
--bf16 | Inference in bfloat16. RTX 30-series and newer; less prone to NaN outputs. |
--xformers | Enable xformers memory-efficient attention. |
--sdpa | Use PyTorch 2 scaled dot-product attention instead of xformers. |
--vae_batch_size <n> | Separate batch size for VAE decode (reduce if you get OOM after denoising). |
--vae_slices <n> | Slice images during VAE decode (e.g. 16 or 32) to lower peak VRAM. |
--no_half_vae | Force fp32 VAE decode to avoid VAE artifacts. |
Prompt syntax
Write negative prompts inline using--n:--max_embeddings_multiples. For example, --max_embeddings_multiples 3 allows prompts up to 225 tokens.Dynamic prompts (wildcards)
| Syntax | Behavior |
|---|---|
{A|B|C} | Randomly pick one of A, B, or C. |
{e$$A|B|C} | Generate one image for every option in order. |
{2$$A|B|C} | Randomly pick 2 items and combine them. |
{2$$ and $$A|B|C} | Same, joined with ” and ” instead of ”, ”. |
Prompt options
You can override per-image settings directly inside the prompt text using--x flags. These work in interactive mode, command-line prompts, and prompt files.| Flag | Description |
|---|---|
--n <text> | Negative prompt. |
--w <n> | Image width for this prompt. |
--h <n> | Image height for this prompt. |
--s <n> | Step count for this prompt. |
--d <seed> | Seed (comma-separated list for multiple images). |
--l <f> | Guidance scale for this prompt. |
--t <f> | img2img strength for this prompt. |
--am <f>[,<f>…] | LoRA multipliers (one per network, comma-separated). |
--f <name> | Output file name for this image. |
LoRA support
Apply one or more LoRA networks at inference time. The number of--network_module, --network_weights, and --network_mul entries must match.| Option | Description |
|---|---|
--network_module | Network type. Use networks.lora for LoRA. Repeat for multiple networks. |
--network_weights <path> | Weight file(s). One per --network_module. |
--network_mul <f> | Multiplier(s) for each network. Default is 1.0. |
--network_merge | Pre-merge LoRA weights into the base model before generation (faster, but disables per-prompt --am overrides). |
--network_pre_calc | Pre-compute LoRA offsets for each generation step (fast with --am support). |
img2img
Pass an image with--image_path and a denoise strength with --strength.--image_path at a folder, images are processed in filename order. Pad filenames with leading zeros (001.jpg, 002.jpg) to ensure the correct sort order.Inpainting
Supply a grayscale mask image with--mask_image. White regions are repainted; black regions are preserved. Gradients in the mask produce soft transitions.Highres fix
Generate a small image first, then upscale and refine it with img2img to produce a high-resolution output without the compositional problems that come from denoising at large latent sizes.--highres_fix_scale 0.5 means the first stage generates at 512×512 (half of 1024×1024). The second stage upscales and refines to the full size.| Option | Description |
|---|---|
--highres_fix_scale <f> | Ratio of first-stage size to final size. 0.5 → generate at 50% resolution first. |
--highres_fix_steps <n> | Steps for the first stage. Default is 28. |
--highres_fix_save_1st | Also save the first-stage image. |
--highres_fix_latents_upscaling | Upscale in latent space (bilinear) instead of pixel space (LANCZOS4). |
ControlNet
ControlNet v1.0 and v1.1 models are supported. Only Canny preprocessing is built in; pre-process other condition types (depth, pose, etc.) before passing them as guide images.| Option | Description |
|---|---|
--control_net_models <path> | ControlNet model file(s). |
--guide_image_path <path> | Hint image or folder of hint images. |
--control_net_preps <name> | Preprocessing per model. Use none to skip, or canny_63_191 for Canny with thresholds 63 and 191. |
--control_net_weights <f> | Influence weight per model (e.g. 1.0). |
--control_net_ratios <f> | Fraction of steps to apply ControlNet (e.g. 0.5 → first half only). |
