Model Tools

Overview

sd-scripts includes a collection of utility scripts for working with models and LoRA networks outside of training. The tools cover every phase of a typical workflow: merging or resizing LoRAs, combining base models, pre-caching encoded representations to speed up training, converting between file formats, and inspecting model metadata.

LoRA Merging

Combine multiple LoRAs or bake a LoRA into its base model.

LoRA Resizing

Reduce a LoRA’s rank to shrink the file and lower inference overhead.

Model Merging

Blend two or more full base models together with configurable ratios.

Latent Caching

Pre-encode training images to latents to skip the VAE at each step.

Text Encoder Caching

Pre-encode captions when the text encoder is frozen.

Format Conversion

Convert between safetensors, Diffusers, and FLUX checkpoint formats.

LoRA Extraction

Extract a LoRA from the difference between two model checkpoints.

Metadata Inspection

View training metadata stored inside a .safetensors file.

LoRA merging

networks/merge_lora.py handles two distinct operations:

Merge multiple LoRAs together into a single LoRA file. Omit --sd_model and provide multiple --models.
Bake a LoRA into a base model by also supplying --sd_model. The LoRA weights are folded directly into the model weights and saved as a new checkpoint.

Use sdxl_merge_lora.py for SDXL models and flux_merge_lora.py for FLUX models.

Merge multiple LoRAs into one

python networks/merge_lora.py \
  --save_to merged_lora.safetensors \
  --models lora_a.safetensors lora_b.safetensors \
  --ratios 0.6 0.4 \
  --precision float \
  --save_precision fp16

Bake a LoRA into a base model

python networks/merge_lora.py \
  --sd_model base_model.safetensors \
  --save_to model_with_lora.safetensors \
  --models my_lora.safetensors \
  --ratios 0.8 \
  --precision float \
  --save_precision fp16

Key arguments

Argument	Description
`--save_to <path>`	Output file path (`.safetensors` or `.ckpt`).
`--models <path> …`	One or more LoRA files to merge.
`--ratios <f> …`	Mixing weight per model. Must match the number of `--models` entries.
`--sd_model <path>`	Base model to bake into. Omit to merge LoRA-to-LoRA.
`--precision`	Working precision: `float` (recommended), `fp16`, or `bf16`.
`--save_precision`	Saved file precision: `float`, `fp16`, or `bf16`.
`--v2`	Load as an SD 2.x model.
`--concat`	Concatenate LoRA matrices instead of adding them (output rank equals the sum of input ranks).

For SDXL, use networks/sdxl_merge_lora.py. For FLUX, use networks/flux_merge_lora.py. The arguments are the same.

LoRA resizing

networks/resize_lora.py reduces the rank of an existing LoRA by projecting its weight matrices to a lower-rank approximation using singular value decomposition. Use this to shrink an oversize LoRA without retraining.

python networks/resize_lora.py \
  --model original_lora.safetensors \
  --save_to smaller_lora.safetensors \
  --new_rank 8 \
  --save_precision fp16 \
  --device cuda

Key arguments

Argument	Default	Description
`--model <path>`	—	Input LoRA to resize.
`--save_to <path>`	—	Output file path.
`--new_rank <n>`	`4`	Target rank for linear layers.
`--new_conv_rank <n>`	same as `--new_rank`	Target rank for Conv2d 3×3 layers.
`--save_precision`	`float`	Precision of the output file.
`--device`	CPU	`cuda` for GPU-accelerated SVD.
`--dynamic_method`	—	Dynamic rank selection: `sv_ratio`, `sv_fro`, or `sv_cumulative`. Set `--new_rank` as an upper bound.
`--dynamic_param <f>`	—	Target parameter for the selected dynamic method.
`--svd_lowrank_niter <n>`	`2`	Iterations for `torch.svd_lowrank` on matrices larger than 2048. Set to `0` to use full SVD.
`--verbose`	false	Print per-layer resize statistics.

torch.svd_lowrank makes resizing large SDXL or FLUX LoRAs significantly faster than full SVD. The --svd_lowrank_niter option controls the accuracy-speed trade-off; 2 iterations is a good starting point.

Model merging

tools/merge_models.py blends two or more safetensors base model files together. Each model contributes to the output according to its specified ratio. When ratios are omitted, models are weighted equally and the total contribution sums to 1.0.

python tools/merge_models.py \
  --models model_a.safetensors model_b.safetensors \
  --ratios 0.7 0.3 \
  --output merged_model.safetensors \
  --precision float \
  --saving_precision fp16

Key arguments

Argument	Default	Description
`--models <path> …`	—	Models to merge. All must be `.safetensors`.
`--output <path>`	—	Output file path (`.safetensors` extension added automatically).
`--ratios <f> …`	equal split	Per-model weight. Must match number of `--models`.
`--precision`	`float`	Working precision during the merge.
`--saving_precision`	`float`	Precision of the saved output.
`--unet_only`	false	Merge only UNet weights; copy VAE and text encoder from the first model.
`--device`	`cpu`	Device for tensor operations.
`--show_skipped`	false	Print keys that appear in the first model but not in subsequent ones.

All model files must be in safetensors format. .ckpt files are not supported by this script.

Latent caching

tools/cache_latents.py pre-encodes your training images into VAE latents and saves them to disk. During training the data loader reads the cached latents directly, skipping the VAE encode at every step. This can meaningfully reduce VRAM usage and speed up training, especially when the VAE is large (as with SDXL or FLUX).

python tools/cache_latents.py \
  --pretrained_model_name_or_path base_model.safetensors \
  --dataset_config dataset_config.toml \
  --sdxl

Pass --flux instead of --sdxl when caching for a FLUX training run. Why use latent caching?

Removes the VAE from the training-step compute graph entirely.
Lets you run a larger UNet/DiT batch size on the same GPU because the VAE is not loaded during training.
Required for FLUX training runs where the VAE is too large to keep resident alongside the transformer.

Latent caches are stored next to your dataset images with a .npz extension. If you change the VAE or training resolution, delete the cached files and re-run caching.

Text encoder output caching

tools/cache_text_encoder_outputs.py pre-encodes your captions through the text encoder(s) and saves the embeddings to disk. This is useful when you freeze the text encoder during training — the encoder runs once per caption instead of once per batch.

python tools/cache_text_encoder_outputs.py \
  --pretrained_model_name_or_path base_model.safetensors \
  --dataset_config dataset_config.toml \
  --sdxl

Pass --flux when caching for FLUX training. Why use text encoder output caching?

Eliminates the text encoder from the GPU memory footprint during training when the encoder weights are frozen.
Particularly valuable for FLUX and SD3 training, where T5-XXL alone can exceed 10 GB.

Like latent caches, text encoder caches are stored as .npz files. Re-run caching if you change the text encoder or its precision.

Format conversion

safetensors ↔ Diffusers (SD 1.x / 2.x)

tools/convert_diffusers20_original_sd.py converts between the original Stable Diffusion checkpoint format and the Diffusers model-folder layout.

# Convert a .safetensors checkpoint to a Diffusers folder
python tools/convert_diffusers20_original_sd.py \
  --model_path model.safetensors \
  --checkpoint_path diffusers_model/ \
  --from_safetensors

# Convert a Diffusers folder back to a .safetensors checkpoint
python tools/convert_diffusers20_original_sd.py \
  --model_path diffusers_model/ \
  --checkpoint_path converted.safetensors \
  --to_safetensors

Diffusers ↔ FLUX

tools/convert_diffusers_to_flux.py converts between Diffusers FLUX format and the native FLUX safetensors format expected by the training scripts.

python tools/convert_diffusers_to_flux.py \
  --input_dir diffusers_flux_model/ \
  --output_path flux_native.safetensors

LoRA extraction

networks/extract_lora_from_models.py computes the difference between two model checkpoints and approximates it as a low-rank LoRA. This is useful for capturing the changes introduced by a fine-tuned model as a portable LoRA file.

python networks/extract_lora_from_models.py \
  --model_org base_model.safetensors \
  --model_tuned finetuned_model.safetensors \
  --save_to extracted_lora.safetensors \
  --dim 16 \
  --device cuda

Argument	Description
`--model_org <path>`	The original (unmodified) base model.
`--model_tuned <path>`	The fine-tuned model to extract changes from.
`--save_to <path>`	Output LoRA file path.
`--dim <n>`	Rank of the extracted LoRA. Higher rank captures more of the difference. Default is `4`.
`--device`	`cuda` for GPU-accelerated SVD, otherwise CPU.
`--v2`	Use SD 2.x key layout.

For FLUX models, use networks/flux_extract_lora.py instead.

Metadata inspection

tools/show_metadata.py reads the metadata block embedded in a .safetensors file and prints it as formatted JSON. Training scripts store hyperparameters, dataset hashes, model type, and other information in this block automatically.

python tools/show_metadata.py --model my_lora.safetensors

Example output:

{
    "ss_base_model_version": "sdxl_base_v1-0",
    "ss_learning_rate": "0.0001",
    "ss_network_alpha": "16",
    "ss_network_dim": "32",
    "ss_network_module": "networks.lora",
    "ss_num_epochs": "10",
    "ss_optimizer": "bitsandbytes.optim.adamw.AdamW8bit",
    "sshs_legacy_hash": "a1b2c3d4",
    "sshs_model_hash": "e5f6a7b8"
}

This is the quickest way to recall the training settings used to produce a model, or to verify that a file is a valid safetensors checkpoint.

Getting Started

Dataset Preparation

LoRA Training

Fine-tuning & Other Methods

Inference & Utilities

Overview

LoRA Merging

LoRA Resizing

Model Merging

Latent Caching

Text Encoder Caching

Format Conversion

LoRA Extraction

Metadata Inspection

LoRA merging

Merge multiple LoRAs into one

Bake a LoRA into a base model

Key arguments

LoRA resizing

Key arguments

Model merging

Key arguments

Latent caching

Text encoder output caching

Format conversion

safetensors ↔ Diffusers (SD 1.x / 2.x)

Diffusers ↔ FLUX

LoRA extraction

Metadata inspection

Build docs developers (and LLMs) love

Getting Started

Dataset Preparation

LoRA Training

Fine-tuning & Other Methods

Inference & Utilities

​Overview

LoRA Merging

LoRA Resizing

Model Merging

Latent Caching

Text Encoder Caching

Format Conversion

LoRA Extraction

Metadata Inspection

​LoRA merging

​Merge multiple LoRAs into one

​Bake a LoRA into a base model

​Key arguments

​LoRA resizing

​Key arguments

​Model merging

​Key arguments

​Latent caching

​Text encoder output caching

​Format conversion

​safetensors ↔ Diffusers (SD 1.x / 2.x)

​Diffusers ↔ FLUX

​LoRA extraction

​Metadata inspection

Build docs developers (and LLMs) love

Overview

LoRA merging

Merge multiple LoRAs into one

Bake a LoRA into a base model

Key arguments

LoRA resizing

Key arguments

Model merging

Key arguments

Latent caching

Text encoder output caching

Format conversion

safetensors ↔ Diffusers (SD 1.x / 2.x)

Diffusers ↔ FLUX

LoRA extraction

Metadata inspection