Skip to main content

Overview

sd-scripts includes a collection of utility scripts for working with models and LoRA networks outside of training. The tools cover every phase of a typical workflow: merging or resizing LoRAs, combining base models, pre-caching encoded representations to speed up training, converting between file formats, and inspecting model metadata.

LoRA Merging

Combine multiple LoRAs or bake a LoRA into its base model.

LoRA Resizing

Reduce a LoRA’s rank to shrink the file and lower inference overhead.

Model Merging

Blend two or more full base models together with configurable ratios.

Latent Caching

Pre-encode training images to latents to skip the VAE at each step.

Text Encoder Caching

Pre-encode captions when the text encoder is frozen.

Format Conversion

Convert between safetensors, Diffusers, and FLUX checkpoint formats.

LoRA Extraction

Extract a LoRA from the difference between two model checkpoints.

Metadata Inspection

View training metadata stored inside a .safetensors file.

LoRA merging

networks/merge_lora.py handles two distinct operations:
  • Merge multiple LoRAs together into a single LoRA file. Omit --sd_model and provide multiple --models.
  • Bake a LoRA into a base model by also supplying --sd_model. The LoRA weights are folded directly into the model weights and saved as a new checkpoint.
Use sdxl_merge_lora.py for SDXL models and flux_merge_lora.py for FLUX models.

Merge multiple LoRAs into one

python networks/merge_lora.py \
  --save_to merged_lora.safetensors \
  --models lora_a.safetensors lora_b.safetensors \
  --ratios 0.6 0.4 \
  --precision float \
  --save_precision fp16

Bake a LoRA into a base model

python networks/merge_lora.py \
  --sd_model base_model.safetensors \
  --save_to model_with_lora.safetensors \
  --models my_lora.safetensors \
  --ratios 0.8 \
  --precision float \
  --save_precision fp16

Key arguments

ArgumentDescription
--save_to <path>Output file path (.safetensors or .ckpt).
--models <path> …One or more LoRA files to merge.
--ratios <f> …Mixing weight per model. Must match the number of --models entries.
--sd_model <path>Base model to bake into. Omit to merge LoRA-to-LoRA.
--precisionWorking precision: float (recommended), fp16, or bf16.
--save_precisionSaved file precision: float, fp16, or bf16.
--v2Load as an SD 2.x model.
--concatConcatenate LoRA matrices instead of adding them (output rank equals the sum of input ranks).
For SDXL, use networks/sdxl_merge_lora.py. For FLUX, use networks/flux_merge_lora.py. The arguments are the same.

LoRA resizing

networks/resize_lora.py reduces the rank of an existing LoRA by projecting its weight matrices to a lower-rank approximation using singular value decomposition. Use this to shrink an oversize LoRA without retraining.
python networks/resize_lora.py \
  --model original_lora.safetensors \
  --save_to smaller_lora.safetensors \
  --new_rank 8 \
  --save_precision fp16 \
  --device cuda

Key arguments

ArgumentDefaultDescription
--model <path>Input LoRA to resize.
--save_to <path>Output file path.
--new_rank <n>4Target rank for linear layers.
--new_conv_rank <n>same as --new_rankTarget rank for Conv2d 3×3 layers.
--save_precisionfloatPrecision of the output file.
--deviceCPUcuda for GPU-accelerated SVD.
--dynamic_methodDynamic rank selection: sv_ratio, sv_fro, or sv_cumulative. Set --new_rank as an upper bound.
--dynamic_param <f>Target parameter for the selected dynamic method.
--svd_lowrank_niter <n>2Iterations for torch.svd_lowrank on matrices larger than 2048. Set to 0 to use full SVD.
--verbosefalsePrint per-layer resize statistics.
torch.svd_lowrank makes resizing large SDXL or FLUX LoRAs significantly faster than full SVD. The --svd_lowrank_niter option controls the accuracy-speed trade-off; 2 iterations is a good starting point.

Model merging

tools/merge_models.py blends two or more safetensors base model files together. Each model contributes to the output according to its specified ratio. When ratios are omitted, models are weighted equally and the total contribution sums to 1.0.
python tools/merge_models.py \
  --models model_a.safetensors model_b.safetensors \
  --ratios 0.7 0.3 \
  --output merged_model.safetensors \
  --precision float \
  --saving_precision fp16

Key arguments

ArgumentDefaultDescription
--models <path> …Models to merge. All must be .safetensors.
--output <path>Output file path (.safetensors extension added automatically).
--ratios <f> …equal splitPer-model weight. Must match number of --models.
--precisionfloatWorking precision during the merge.
--saving_precisionfloatPrecision of the saved output.
--unet_onlyfalseMerge only UNet weights; copy VAE and text encoder from the first model.
--devicecpuDevice for tensor operations.
--show_skippedfalsePrint keys that appear in the first model but not in subsequent ones.
All model files must be in safetensors format. .ckpt files are not supported by this script.

Latent caching

tools/cache_latents.py pre-encodes your training images into VAE latents and saves them to disk. During training the data loader reads the cached latents directly, skipping the VAE encode at every step. This can meaningfully reduce VRAM usage and speed up training, especially when the VAE is large (as with SDXL or FLUX).
python tools/cache_latents.py \
  --pretrained_model_name_or_path base_model.safetensors \
  --dataset_config dataset_config.toml \
  --sdxl
Pass --flux instead of --sdxl when caching for a FLUX training run. Why use latent caching?
  • Removes the VAE from the training-step compute graph entirely.
  • Lets you run a larger UNet/DiT batch size on the same GPU because the VAE is not loaded during training.
  • Required for FLUX training runs where the VAE is too large to keep resident alongside the transformer.
Latent caches are stored next to your dataset images with a .npz extension. If you change the VAE or training resolution, delete the cached files and re-run caching.

Text encoder output caching

tools/cache_text_encoder_outputs.py pre-encodes your captions through the text encoder(s) and saves the embeddings to disk. This is useful when you freeze the text encoder during training — the encoder runs once per caption instead of once per batch.
python tools/cache_text_encoder_outputs.py \
  --pretrained_model_name_or_path base_model.safetensors \
  --dataset_config dataset_config.toml \
  --sdxl
Pass --flux when caching for FLUX training. Why use text encoder output caching?
  • Eliminates the text encoder from the GPU memory footprint during training when the encoder weights are frozen.
  • Particularly valuable for FLUX and SD3 training, where T5-XXL alone can exceed 10 GB.
Like latent caches, text encoder caches are stored as .npz files. Re-run caching if you change the text encoder or its precision.

Format conversion

safetensors ↔ Diffusers (SD 1.x / 2.x)

tools/convert_diffusers20_original_sd.py converts between the original Stable Diffusion checkpoint format and the Diffusers model-folder layout.
# Convert a .safetensors checkpoint to a Diffusers folder
python tools/convert_diffusers20_original_sd.py \
  --model_path model.safetensors \
  --checkpoint_path diffusers_model/ \
  --from_safetensors

# Convert a Diffusers folder back to a .safetensors checkpoint
python tools/convert_diffusers20_original_sd.py \
  --model_path diffusers_model/ \
  --checkpoint_path converted.safetensors \
  --to_safetensors

Diffusers ↔ FLUX

tools/convert_diffusers_to_flux.py converts between Diffusers FLUX format and the native FLUX safetensors format expected by the training scripts.
python tools/convert_diffusers_to_flux.py \
  --input_dir diffusers_flux_model/ \
  --output_path flux_native.safetensors

LoRA extraction

networks/extract_lora_from_models.py computes the difference between two model checkpoints and approximates it as a low-rank LoRA. This is useful for capturing the changes introduced by a fine-tuned model as a portable LoRA file.
python networks/extract_lora_from_models.py \
  --model_org base_model.safetensors \
  --model_tuned finetuned_model.safetensors \
  --save_to extracted_lora.safetensors \
  --dim 16 \
  --device cuda
ArgumentDescription
--model_org <path>The original (unmodified) base model.
--model_tuned <path>The fine-tuned model to extract changes from.
--save_to <path>Output LoRA file path.
--dim <n>Rank of the extracted LoRA. Higher rank captures more of the difference. Default is 4.
--devicecuda for GPU-accelerated SVD, otherwise CPU.
--v2Use SD 2.x key layout.
For FLUX models, use networks/flux_extract_lora.py instead.

Metadata inspection

tools/show_metadata.py reads the metadata block embedded in a .safetensors file and prints it as formatted JSON. Training scripts store hyperparameters, dataset hashes, model type, and other information in this block automatically.
python tools/show_metadata.py --model my_lora.safetensors
Example output:
{
    "ss_base_model_version": "sdxl_base_v1-0",
    "ss_learning_rate": "0.0001",
    "ss_network_alpha": "16",
    "ss_network_dim": "32",
    "ss_network_module": "networks.lora",
    "ss_num_epochs": "10",
    "ss_optimizer": "bitsandbytes.optim.adamw.AdamW8bit",
    "sshs_legacy_hash": "a1b2c3d4",
    "sshs_model_hash": "e5f6a7b8"
}
This is the quickest way to recall the training settings used to produce a model, or to verify that a file is a valid safetensors checkpoint.

Build docs developers (and LLMs) love