--network_module. Different modules target different model architectures.
Available modules
| Module | File | Target architecture |
|---|---|---|
networks.lora | networks/lora.py | SD 1.x / 2.x |
networks.lora_flux | networks/lora_flux.py | FLUX.1 |
networks.lora_sd3 | networks/lora_sd3.py | SD3 / SD3.5 |
networks.lora_lumina | networks/lora_lumina.py | Lumina |
networks.lora_hunyuan_image | networks/lora_hunyuan_image.py | HunyuanImage |
networks.lora_anima | networks/lora_anima.py | Anima |
networks.dylora | networks/dylora.py | SD 1.x / 2.x (DyLoRA) |
networks.lora_fa | networks/lora_fa.py | SD (FA variant) |
--network_module:
Core network args
You pass additional options to the network module using--network_args. Each value is a quoted key=value string.
Conv2d extension
By default,networks.lora targets only Linear and Conv2d 1×1 layers. To extend LoRA to Conv2d 3×3 layers (e.g., ResNet blocks in SD 1.x/2.x), set conv_dim and conv_alpha.
Rank for Conv2d 3×3 layers. When set, LoRA is applied to
ResnetBlock2D, Downsample2D, and Upsample2D modules in addition to attention layers. Has no effect if omitted.Alpha scaling value for Conv2d 3×3 LoRA layers. Defaults to
1.0 when conv_dim is set but conv_alpha is not.LoRA+ differential learning rates
LoRA+ applies a higher learning rate to the up-projection matrix (lora_up) than to the down-projection matrix (lora_down), which can improve convergence.
Multiplier applied to
lora_up weights relative to the base learning rate. Applies to both UNet and text encoder. For example, loraplus_lr_ratio=4 sets the up-projection learning rate to 4× the base.LoRA+ ratio applied only to UNet modules. Overrides
loraplus_lr_ratio for the UNet.LoRA+ ratio applied only to text encoder modules. Overrides
loraplus_lr_ratio for the text encoder.Per-block dimensions
You can assign different ranks to each block in the UNet rather than using a single global rank. This lets you allocate more capacity to blocks that matter most for your use case.Comma-separated list of integer ranks, one per UNet block. For SD 1.x/2.x, provide 25 values (1 input + 12 down + 1 mid + 12 up). For SDXL, provide 23 values. Example:
"4,4,4,4,8,8,8,8,16,16,16,16,16,16,16,16,16,16,16,8,8,8,8,4,4".Comma-separated list of alpha values corresponding to each entry in
block_dims. Defaults to the global --network_alpha for any block not explicitly set.Comma-separated list of Conv2d 3×3 ranks per block, same length as
block_dims. Requires block_dims to be set.Comma-separated list of Conv2d 3×3 alpha values per block, same length as
block_dims.Dropout
Probability of zeroing individual rank dimensions during training. Operates on the hidden state after
lora_down. For example, rank_dropout=0.1 drops 10% of rank channels per forward pass. Has no effect at inference.Probability of skipping an entire LoRA module for a given forward pass during training. When a module is dropped, the original pre-trained weight is used unchanged. For example,
module_dropout=0.1 skips each module 10% of the time.Additional options
Enable Tucker decomposition for Conv2d 3×3 LoRA layers. Tucker decomposition factors the kernel dimensions through a separate tensor, which can be more parameter-efficient than flattening the kernel into the input dimension.
Train an additional scalar parameter per LoRA module. The scalar multiplies the LoRA output, giving the optimizer more flexibility to adjust the effective magnitude of each module’s contribution.
Include normalization layers (e.g., LayerNorm, GroupNorm) as training targets in addition to Linear and Conv2d layers. This can improve fidelity for certain fine-tuning tasks but increases the risk of overfitting.
Enable DoRA (Weight-Decomposed Low-Rank Adaptation). DoRA decomposes the weight update into a magnitude component and a direction component (the LoRA matrices), similar to weight normalization. This can improve fine-tuning quality, especially for tasks that require significant style changes.
DyLoRA
networks.dylora implements DyLoRA (Dynamic Low-Rank Adaptation), which trains a nested set of ranks simultaneously. At inference you can extract a LoRA at any rank up to the maximum trained rank without retraining.
Use networks.dylora in place of networks.lora and set the block size with --network_args dyloRA=N:
Block size for DyLoRA training. Ranks are trained in increments of this value up to
--network_dim. For example, with --network_dim 32 and dyloRA=4, the module is trained at ranks 4, 8, 12, …, 32 simultaneously.networks/extract_lora_from_dylora.py to extract a LoRA at a specific rank.
FLUX.1 LoRA (networks.lora_flux)
networks.lora_flux targets FLUX.1’s DoubleStreamBlock and SingleStreamBlock modules. It supports several FLUX-specific --network_args:
Block-type dims
Block-type dims
You can set separate ranks for different layer types within FLUX blocks:
| Arg | Layer type |
|---|---|
img_attn_dim | Image attention in DoubleStreamBlock |
txt_attn_dim | Text attention in DoubleStreamBlock |
img_mlp_dim | Image MLP in DoubleStreamBlock |
txt_mlp_dim | Text MLP in DoubleStreamBlock |
img_mod_dim | Image modulation in DoubleStreamBlock |
txt_mod_dim | Text modulation in DoubleStreamBlock |
single_dim | Linear layers in SingleStreamBlock |
single_mod_dim | Modulation in SingleStreamBlock |
Block selection
Block selection
Train only specific blocks using indices:
train_blocks=double or train_blocks=single restricts training to only that block type.T5XXL text encoder
T5XXL text encoder
By default, only the CLIP text encoder is trained. To also train the T5XXL encoder:
Split QKV
Split QKV
FLUX combines Q, K, and V into a single projection. Set
split_qkv=True to train them with separate LoRA adapters:Full example
The following command trains a LoRA for SD 1.x with Conv2d extension, LoRA+, and rank dropout:flux_train_network.py, the model path with your FLUX checkpoint, and --network_module with networks.lora_flux.