Overview
sd-scripts includes a collection of utility scripts for working with models and LoRA networks outside of training. The tools cover every phase of a typical workflow: merging or resizing LoRAs, combining base models, pre-caching encoded representations to speed up training, converting between file formats, and inspecting model metadata.LoRA Merging
Combine multiple LoRAs or bake a LoRA into its base model.
LoRA Resizing
Reduce a LoRA’s rank to shrink the file and lower inference overhead.
Model Merging
Blend two or more full base models together with configurable ratios.
Latent Caching
Pre-encode training images to latents to skip the VAE at each step.
Text Encoder Caching
Pre-encode captions when the text encoder is frozen.
Format Conversion
Convert between safetensors, Diffusers, and FLUX checkpoint formats.
LoRA Extraction
Extract a LoRA from the difference between two model checkpoints.
Metadata Inspection
View training metadata stored inside a
.safetensors file.LoRA merging
networks/merge_lora.py handles two distinct operations:
- Merge multiple LoRAs together into a single LoRA file. Omit
--sd_modeland provide multiple--models. - Bake a LoRA into a base model by also supplying
--sd_model. The LoRA weights are folded directly into the model weights and saved as a new checkpoint.
sdxl_merge_lora.py for SDXL models and flux_merge_lora.py for FLUX models.
Merge multiple LoRAs into one
Bake a LoRA into a base model
Key arguments
| Argument | Description |
|---|---|
--save_to <path> | Output file path (.safetensors or .ckpt). |
--models <path> … | One or more LoRA files to merge. |
--ratios <f> … | Mixing weight per model. Must match the number of --models entries. |
--sd_model <path> | Base model to bake into. Omit to merge LoRA-to-LoRA. |
--precision | Working precision: float (recommended), fp16, or bf16. |
--save_precision | Saved file precision: float, fp16, or bf16. |
--v2 | Load as an SD 2.x model. |
--concat | Concatenate LoRA matrices instead of adding them (output rank equals the sum of input ranks). |
For SDXL, use
networks/sdxl_merge_lora.py. For FLUX, use networks/flux_merge_lora.py. The arguments are the same.LoRA resizing
networks/resize_lora.py reduces the rank of an existing LoRA by projecting its weight matrices to a lower-rank approximation using singular value decomposition. Use this to shrink an oversize LoRA without retraining.
Key arguments
| Argument | Default | Description |
|---|---|---|
--model <path> | — | Input LoRA to resize. |
--save_to <path> | — | Output file path. |
--new_rank <n> | 4 | Target rank for linear layers. |
--new_conv_rank <n> | same as --new_rank | Target rank for Conv2d 3×3 layers. |
--save_precision | float | Precision of the output file. |
--device | CPU | cuda for GPU-accelerated SVD. |
--dynamic_method | — | Dynamic rank selection: sv_ratio, sv_fro, or sv_cumulative. Set --new_rank as an upper bound. |
--dynamic_param <f> | — | Target parameter for the selected dynamic method. |
--svd_lowrank_niter <n> | 2 | Iterations for torch.svd_lowrank on matrices larger than 2048. Set to 0 to use full SVD. |
--verbose | false | Print per-layer resize statistics. |
Model merging
tools/merge_models.py blends two or more safetensors base model files together. Each model contributes to the output according to its specified ratio. When ratios are omitted, models are weighted equally and the total contribution sums to 1.0.
Key arguments
| Argument | Default | Description |
|---|---|---|
--models <path> … | — | Models to merge. All must be .safetensors. |
--output <path> | — | Output file path (.safetensors extension added automatically). |
--ratios <f> … | equal split | Per-model weight. Must match number of --models. |
--precision | float | Working precision during the merge. |
--saving_precision | float | Precision of the saved output. |
--unet_only | false | Merge only UNet weights; copy VAE and text encoder from the first model. |
--device | cpu | Device for tensor operations. |
--show_skipped | false | Print keys that appear in the first model but not in subsequent ones. |
Latent caching
tools/cache_latents.py pre-encodes your training images into VAE latents and saves them to disk. During training the data loader reads the cached latents directly, skipping the VAE encode at every step. This can meaningfully reduce VRAM usage and speed up training, especially when the VAE is large (as with SDXL or FLUX).
--flux instead of --sdxl when caching for a FLUX training run.
Why use latent caching?
- Removes the VAE from the training-step compute graph entirely.
- Lets you run a larger UNet/DiT batch size on the same GPU because the VAE is not loaded during training.
- Required for FLUX training runs where the VAE is too large to keep resident alongside the transformer.
Latent caches are stored next to your dataset images with a
.npz extension. If you change the VAE or training resolution, delete the cached files and re-run caching.Text encoder output caching
tools/cache_text_encoder_outputs.py pre-encodes your captions through the text encoder(s) and saves the embeddings to disk. This is useful when you freeze the text encoder during training — the encoder runs once per caption instead of once per batch.
--flux when caching for FLUX training.
Why use text encoder output caching?
- Eliminates the text encoder from the GPU memory footprint during training when the encoder weights are frozen.
- Particularly valuable for FLUX and SD3 training, where T5-XXL alone can exceed 10 GB.
Like latent caches, text encoder caches are stored as
.npz files. Re-run caching if you change the text encoder or its precision.Format conversion
safetensors ↔ Diffusers (SD 1.x / 2.x)
tools/convert_diffusers20_original_sd.py converts between the original Stable Diffusion checkpoint format and the Diffusers model-folder layout.
Diffusers ↔ FLUX
tools/convert_diffusers_to_flux.py converts between Diffusers FLUX format and the native FLUX safetensors format expected by the training scripts.
LoRA extraction
networks/extract_lora_from_models.py computes the difference between two model checkpoints and approximates it as a low-rank LoRA. This is useful for capturing the changes introduced by a fine-tuned model as a portable LoRA file.
| Argument | Description |
|---|---|
--model_org <path> | The original (unmodified) base model. |
--model_tuned <path> | The fine-tuned model to extract changes from. |
--save_to <path> | Output LoRA file path. |
--dim <n> | Rank of the extracted LoRA. Higher rank captures more of the difference. Default is 4. |
--device | cuda for GPU-accelerated SVD, otherwise CPU. |
--v2 | Use SD 2.x key layout. |
networks/flux_extract_lora.py instead.
Metadata inspection
tools/show_metadata.py reads the metadata block embedded in a .safetensors file and prints it as formatted JSON. Training scripts store hyperparameters, dataset hashes, model type, and other information in this block automatically.
