Advanced LoRA Options

Overview

The sd-scripts training scripts expose a large set of advanced options beyond the basic --network_dim and --learning_rate flags. This page covers the most impactful options for users who want precise control over training behavior.

The examples on this page use sdxl_train_network.py for illustration, but most options also apply to train_network.py, flux_train_network.py, and sd3_train_network.py.

Block-wise LoRA dimensions and alphas

What block-wise LoRA does

By default, every layer in the U-Net gets the same rank (--network_dim) and alpha (--network_alpha). Block-wise settings let you assign different ranks to different parts of the network, which is useful when you want to concentrate the adapter’s capacity in specific layers.For SDXL, the U-Net has 23 blocks. You pass a comma-separated list of 23 integers to block_dims and block_alphas via --network_args.

--network_args \
  "block_dims=2,2,2,2,4,4,4,4,8,8,8,8,8,8,8,8,4,4,4,4,2,2,2" \
  "block_alphas=1,1,1,1,2,2,2,2,4,4,4,4,4,4,4,4,2,2,2,2,1,1,1"

Any block not listed falls back to the global --network_dim / --network_alpha values.

Adding Conv2d 3x3 block dimensions

To also control the 3×3 convolution layers block by block, add conv_block_dims and conv_block_alphas:

--network_args \
  "block_dims=2,2,2,2,4,4,4,4,8,8,8,8,8,8,8,8,4,4,4,4,2,2,2" \
  "block_alphas=1,1,1,1,2,2,2,2,4,4,4,4,4,4,4,4,2,2,2,2,1,1,1" \
  "conv_block_dims=2,2,2,2,2,2,2,2,4,4,4,4,4,4,4,4,2,2,2,2,2,2,2" \
  "conv_block_alphas=1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,1,1,1,1,1,1,1"

LoRA+

What LoRA+ does

LoRA+ sets the learning rate of the UP weight matrices to a multiple of the DOWN matrices’ learning rate. The paper suggests this speeds up learning because the two matrices have different optimal learning rates. A ratio of 16 is recommended.

--network_args "loraplus_lr_ratio=16"

You can also set separate ratios for U-Net and text encoders:

--network_args \
  "loraplus_unet_lr_ratio=16" \
  "loraplus_text_encoder_lr_ratio=4"

LoRA+ is not compatible with auto-LR optimizers such as DAdaptation or Prodigy.

DyLoRA

Training with DyLoRA

DyLoRA trains a range of ranks simultaneously, so you can select the effective rank at inference time without retraining. Use networks.dylora as the network module and specify the rank range with unit:

--network_module=networks.dylora \
--network_dim=64 \
--network_args "unit=4"

This trains ranks 4, 8, 12, …, 64 simultaneously. At inference you can use any multiple of unit up to network_dim by adjusting the LoRA multiplier.

Learning rate schedulers

cosine

Decays the learning rate following a cosine curve from the initial value to zero over the full training run.

--lr_scheduler="cosine"

cosine_with_restarts

Like cosine, but restarts the cosine curve N times throughout training. Useful for escaping local minima.

--lr_scheduler="cosine_with_restarts" \
--lr_scheduler_num_cycles=3

polynomial

Decays the learning rate according to a polynomial function. Control the shape with --lr_scheduler_power.

--lr_scheduler="polynomial" \
--lr_scheduler_power=2

constant_with_warmup

Keeps the learning rate constant after a warmup phase. Useful when you want the optimizer to stabilize before full-speed learning.

--lr_scheduler="constant_with_warmup" \
--lr_warmup_steps=500

Warmup ratio

When --lr_warmup_steps is less than 1, it is interpreted as a fraction of the total number of training steps:

--lr_warmup_steps=0.05

This sets warmup to 5% of total steps, automatically.

Optimizer options

AdamW8bit (recommended default)

Memory-efficient 8-bit AdamW. Requires the bitsandbytes library. Good balance between stability and VRAM usage.

--optimizer_type="AdamW8bit"

Adafactor

Highly memory-efficient optimizer. Useful when VRAM is critically limited. Recommended with relative_step=True and the adafactor scheduler.

--optimizer_type="Adafactor" \
--optimizer_args "relative_step=True" "scale_parameter=True" "warmup_init=True" \
--lr_scheduler="adafactor"

Lion

Sign-based gradient update optimizer. Can converge faster than AdamW for some tasks. Requires lion-pytorch.

--optimizer_type="Lion" \
--learning_rate=1e-5

Use a lower learning rate than AdamW (~10× lower).

Prodigy

Auto-adjusting learning rate optimizer. Set the initial learning rate to 1.0 and let Prodigy tune it during training.

--optimizer_type="Prodigy" \
--learning_rate=1.0 \
--lr_scheduler="constant"

Not compatible with LoRA+.

Passing extra optimizer arguments

Use --optimizer_args to pass key=value pairs to the optimizer:

--optimizer_args "weight_decay=0.01" "betas=0.9,0.999"

Mixed precision

fp16 vs bf16

Both fp16 and bf16 reduce VRAM usage compared to full float32 training.

Format	Dynamic range	Precision	Best for
`fp16`	Smaller	Higher	SD 1.x/2.x, older GPUs
`bf16`	Larger	Lower	SDXL, FLUX.1, SD3; RTX 3000+, A100

--mixed_precision="bf16"

Use bf16 whenever your GPU supports it (Ampere and later, or any Tensor Core GPU). It avoids the NaN issues that can occur with the SDXL VAE under fp16.

Full half-precision training

For critical VRAM situations, you can force gradient computations entirely in half precision:

--full_bf16
# or
--full_fp16

This can cause training instability. Monitor your loss carefully and consider adding --max_grad_norm=1.0.

FP8 base model (experimental)

Load the base model in FP8 to save significant VRAM. Requires PyTorch 2.1+.

--fp8_base

--fp8_base_unet loads only the U-Net in FP8, leaving text encoders in the default precision.

Gradient checkpointing

Basic gradient checkpointing

Gradient checkpointing trades compute time for memory. Activations are not stored during the forward pass; instead they are recomputed during backpropagation.

--gradient_checkpointing

Reduces VRAM by roughly 30–50% for large models. Training becomes 10–20% slower.

Gradient accumulation

Accumulate gradients over multiple steps before updating the optimizer. Effective batch size becomes train_batch_size × gradient_accumulation_steps.

--gradient_accumulation_steps=4

Use this to simulate a larger batch size when VRAM is limited.

Gradient clipping

Clip the gradient norm to prevent instability when the learning rate is high.

--max_grad_norm=1.0

Set to 0 to disable gradient clipping entirely.

Saving checkpoints

Periodic saving

Save a checkpoint every N epochs or every N steps:

--save_every_n_epochs=2
--save_every_n_steps=500

Both can be specified simultaneously. Each triggers independently.

Keeping only the latest N checkpoints

Prevent disk from filling up by keeping only the most recent M checkpoints:

--save_last_n_epochs=5
# or
--save_last_n_steps=3000

Saving optimizer state for resume

Save the full training state (optimizer, step counter) so you can resume later:

--save_state \
--save_last_n_epochs_state=2

Use --save_state_on_train_end to save the state only at the end of a run.

Resuming training

Resuming from a saved state

Use --resume to continue from a state directory saved by --save_state. This restores the optimizer state, step counter, and epoch counter.

--resume="./output/my_lora-state-epoch00005"

--resume restores the full training state. If you only want to start from existing LoRA weights (without restoring optimizer state), use --network_weights instead.

Starting from existing LoRA weights

Load pre-trained LoRA weights and continue training from them without restoring optimizer state:

--network_weights="./output/my_lora.safetensors"

Add --dim_from_weights to automatically read the rank from the weight file:

--network_weights="./output/my_lora.safetensors" \
--dim_from_weights

Noise techniques

Noise offset

Adds a constant offset to the noise during training. Improves the model’s ability to generate very bright or very dark images. SDXL base models are trained with noise offset, so enabling it during LoRA training can help match the base model’s distribution.

--noise_offset=0.0357 \
--noise_offset_random_strength

Multi-resolution noise

Adds noise at multiple frequency scales simultaneously. Can improve fine detail reproduction.

--multires_noise_iterations=6 \
--multires_noise_discount=0.3

Min-SNR weighting (min_snr_gamma)

Re-weights the training loss across timesteps to stabilize early training, where very high noise timesteps can dominate the gradient.

--min_snr_gamma=5

Input perturbation noise

Adds a small amount of noise to the input latents for regularization.

--ip_noise_gamma=0.1

Network training scope

By default, both the U-Net and the text encoder receive LoRA modules. You can restrict training to one part:

# Train only U-Net LoRA (required when --cache_text_encoder_outputs is set)
--network_train_unet_only

# Train only text encoder LoRA
--network_train_text_encoder_only

Weight norm scaling

Scale the magnitude of LoRA weights during training to help control overfitting:

--scale_weight_norms=1.0

A value of 1.0 is a reasonable starting point.

Differential LoRA (merging existing weights)

Merge one or more existing LoRA files into the base model before starting a new training run. This lets you train the “difference” from an existing LoRA.

--base_weights="./existing_lora.safetensors" \
--base_weights_multiplier=1.0

Multiple weight files and multipliers can be specified by repeating the arguments.

Logging and tracking

TensorBoard

--logging_dir="./logs" \
--log_with="tensorboard"

Then launch TensorBoard:

tensorboard --logdir ./logs

Weights & Biases (wandb)

--logging_dir="./logs" \
--log_with="wandb" \
--wandb_api_key="your_api_key" \
--wandb_run_name="my_experiment"

Install with pip install wandb before use.

Logging the training config

Record the full training configuration at the start of each run for reproducibility:

--log_config

Using a config file instead of command-line arguments

For long training commands, store all arguments in a TOML file and pass it with --config_file:

pretrained_model_name_or_path = "/path/to/model.safetensors"
dataset_config = "my_dataset.toml"
output_dir = "./output"
output_name = "my_lora"
network_module = "networks.lora"
network_dim = 32
network_alpha = 16
learning_rate = 1e-4
optimizer_type = "AdamW8bit"
lr_scheduler = "cosine_with_restarts"
max_train_epochs = 10
mixed_precision = "bf16"
gradient_checkpointing = true

accelerate launch sdxl_train_network.py --config_file="training_config.toml"

Use --output_config to dump the current command-line arguments to a TOML file you can reuse later.

Getting Started

Dataset Preparation

LoRA Training

Fine-tuning & Other Methods

Inference & Utilities

Advanced LoRA Options

Overview

Block-wise LoRA dimensions and alphas

LoRA+

DyLoRA

Learning rate schedulers

Optimizer options

Mixed precision

Gradient checkpointing

Saving checkpoints

Resuming training

Noise techniques

Network training scope

Weight norm scaling

Differential LoRA (merging existing weights)

Logging and tracking

Using a config file instead of command-line arguments

Build docs developers (and LLMs) love

Getting Started

Dataset Preparation

LoRA Training

Fine-tuning & Other Methods

Inference & Utilities

​Overview

​Block-wise LoRA dimensions and alphas

​LoRA+

​DyLoRA

​Learning rate schedulers

​Optimizer options

​Mixed precision

​Gradient checkpointing

​Saving checkpoints

​Resuming training

​Noise techniques

​Network training scope

​Weight norm scaling

​Differential LoRA (merging existing weights)

​Logging and tracking

​Using a config file instead of command-line arguments

Build docs developers (and LLMs) love

Overview

Block-wise LoRA dimensions and alphas

LoRA+

DyLoRA

Learning rate schedulers

Optimizer options

Mixed precision

Gradient checkpointing

Saving checkpoints

Resuming training

Noise techniques

Network training scope

Weight norm scaling

Differential LoRA (merging existing weights)

Logging and tracking

Using a config file instead of command-line arguments