Skip to main content
LoHa and LoKr support is experimental. Behavior may change in future releases.
In addition to standard LoRA, sd-scripts supports LoHa and LoKr as alternative parameter-efficient fine-tuning methods. Both are based on techniques from the LyCORIS project by KohakuBlueleaf.

networks.loha

LoHa — Low-rank Hadamard Product. Represents weight updates as the element-wise product of two low-rank matrix pairs. Roughly twice the parameters of LoRA at the same rank, with greater expressivity.

networks.lokr

LoKr — Low-rank Kronecker Product. Represents weight updates using a Kronecker product with optional low-rank decomposition. Tends to produce smaller models than LoRA at the same rank.

How they work

LoHa

LoHa represents the weight update as a Hadamard (element-wise) product of two low-rank matrix pairs:
ΔW = (W1a × W1b) ⊙ (W2a × W2b)
W1a, W1b, W2a, and W2b are all low-rank matrices with rank network_dim. Because the update involves two independent pairs, LoHa has approximately twice the trainable parameters of LoRA at the same rank. This extra capacity lets it capture more complex weight interactions. For Conv2d 3×3+ layers with Tucker decomposition enabled, each matrix pair also includes a Tucker tensor T, and the reconstruction becomes:
einsum("i j ..., j r, i p -> p r ...", T, Wb, Wa)

LoKr

LoKr represents the weight update using a Kronecker product:
ΔW = W1 ⊗ W2    (where W2 = W2a × W2b in low-rank mode)
The original weight dimensions are factorized — for example, a 512×512 weight might be split so that W1 is 16×16 and W2 is 32×32. W1 is always a full matrix (small), while W2 is low-rank-decomposed unless network_dim is large enough relative to the factorized dimensions, in which case a full matrix is used for W2 automatically (a warning is logged in this case).

Comparison

PropertyLoRALoHaLoKr
Update formulaW_up × W_down(W1a×W1b) ⊙ (W2a×W2b)W1 ⊗ W2
Parameters at same rankBaseline~2× LoRATypically < LoRA
Model file sizeMediumLargerSmaller
Architecture supportSD, FLUX, SD3, …SDXL, AnimaSDXL, Anima
Conv2d 3×3 supportYes (conv_dim)Yes (conv_dim)Yes (conv_dim)

Supported architectures

LoHa and LoKr automatically detect the model architecture and apply appropriate default targets.
  • SDXL: Targets Transformer2DModel for the UNet and CLIPAttention/CLIPMLP for text encoders. Conv2d layers in ResnetBlock2D, Downsample2D, and Upsample2D are also targeted when conv_dim is specified.
  • Anima: Targets Block, PatchEmbed, TimestepEmbedding, and FinalLayer for the DiT, and Qwen3Attention/Qwen3MLP for the text encoder. Default exclude_patterns automatically skip modulation, normalization, embedder, and final_layer modules.

Training

To use LoHa or LoKr, change --network_module in your training command. All other options (dataset config, optimizer, scheduler, etc.) are the same as LoRA.
accelerate launch --num_cpu_threads_per_process 1 --mixed_precision bf16 sdxl_train_network.py \
  --pretrained_model_name_or_path path/to/sdxl.safetensors \
  --dataset_config path/to/dataset.toml \
  --mixed_precision bf16 --fp8_base \
  --optimizer_type adamw8bit \
  --learning_rate 2e-4 \
  --gradient_checkpointing \
  --network_module networks.loha \
  --network_dim 32 \
  --network_alpha 16 \
  --max_train_epochs 16 \
  --save_every_n_epochs 1 \
  --output_dir path/to/output \
  --output_name my-loha

Network args

Pass options to LoHa and LoKr using --network_args. Each value is a quoted key=value string.

Conv2d extension

conv_dim
int
Rank for Conv2d 3×3 layers. When set, LoHa/LoKr is also applied to ResnetBlock2D, Downsample2D, and Upsample2D modules. Has no effect if omitted.
conv_alpha
float
Alpha scaling value for Conv2d 3×3 layers. Should be set alongside conv_dim.

Tucker decomposition (Conv2d 3×3+)

use_tucker
bool
default:"False"
Enable Tucker decomposition for Conv2d 3×3+ layers. Without Tucker, the kernel dimensions are flattened into the input dimension (flat mode). With use_tucker=True, a separate Tucker tensor handles the kernel dimensions, which is generally more parameter-efficient.
--network_args "conv_dim=16" "conv_alpha=8" "use_tucker=True"

Scalar parameter

use_scalar
bool
default:"False"
Train an additional scalar multiplier per module. The scalar adjusts the effective magnitude of each module’s output, giving the optimizer more flexibility.

Dropout

rank_dropout
float
Probability of zeroing individual rank dimensions during training. For example, rank_dropout=0.1 drops 10% of rank channels per forward pass. Has no effect at inference.
module_dropout
float
Probability of skipping an entire module for a given forward pass. When a module is dropped, the original pre-trained weight is used unchanged.

Module selection

exclude_patterns
string
List of regex patterns for module names to skip, in addition to any architecture defaults. For example, to skip all MLP layers:
--network_args "exclude_patterns=[r'.*mlp.*']"
include_patterns
string
Override excludes: modules matching these patterns are included even if they match exclude_patterns.

Per-module learning rates and dims

network_reg_lrs
string
Set per-module learning rates using regex patterns, in regex=lr format separated by commas. For example:
--network_args "network_reg_lrs=.*attn.*=5e-4,.*mlp.*=1e-4"
network_reg_dims
string
Set per-module rank (dim) using regex patterns, in regex=dim format separated by commas.

LoKr-specific: factor

factor
int
default:"-1"
Controls how LoKr factorizes weight dimensions for the Kronecker product.
  • -1 (default): Automatically find the most balanced factorization. For example, dimension 512 is split into (16, 32).
  • Positive integer: Force the first factor to exactly this value. For example, factor=4 splits dimension 512 into (4, 128).
--network_args "factor=4"
decompose_both
bool
default:"False"
When True, apply low-rank decomposition to both Kronecker factor matrices instead of only the second. This increases parameter count but can improve expressivity.

Anima-specific: LLM adapter

train_llm_adapter
bool
default:"False"
Include LLMAdapterTransformerBlock modules as training targets. Applies only to Anima models.
--network_args "train_llm_adapter=True"

LoRA+

LoRA+ is also supported with LoHa and LoKr. For LoHa, the second matrix pair (hada_w2_a) receives the higher learning rate. For LoKr, the scale factor (lokr_w1) receives the higher learning rate.
loraplus_lr_ratio
float
Multiplier for the “plus” parameter group relative to the base learning rate.
--network_args "loraplus_lr_ratio=4"

Inference

Trained LoHa and LoKr weights are saved in safetensors format, identical to LoRA. Load them the same way using --network_module and --network_weights.

SDXL

python gen_img.py \
  --ckpt path/to/sdxl.safetensors \
  --network_module networks.loha \
  --network_weights path/to/my-loha.safetensors \
  --prompt "your prompt" \
  ...
Replace networks.loha with networks.lokr for LoKr weights.

Anima

LoRA, LoHa, and LoKr weights are detected and merged automatically:
python anima_minimal_inference.py \
  --dit path/to/dit \
  --prompt "your prompt" \
  --lora_weight path/to/my-loha.safetensors \
  ...

ComfyUI conversion

To use Anima LoHa/LoKr weights in ComfyUI, convert them with the provided utility:
python networks/convert_anima_lora_to_comfy.py \
  --input path/to/my-loha.safetensors \
  --output path/to/my-loha-comfy.safetensors

Build docs developers (and LLMs) love