Skip to main content
OFT (Orthogonal Fine-Tuning) fine-tunes a model by learning orthogonal rotation matrices rather than low-rank additive adapters. Where LoRA adds a low-rank delta to each weight matrix, OFT rotates the weight matrix using a block-diagonal orthogonal matrix. This preserves the pairwise angles between weight vectors, making OFT a more conservative fine-tuning approach.
Because OFT preserves hyperspherical energy (the pairwise angular relationships between weight vectors), it tends to retain the base model’s style and composition while adapting its content. This makes OFT well-suited for style preservation tasks where LoRA might introduce unwanted drift.

Available modules

ModuleFileTarget architecture
networks.oftnetworks/oft.pySD 1.x / 2.x (and SDXL)
networks.oft_fluxnetworks/oft_flux.pyFLUX.1

How OFT differs from LoRA

PropertyLoRAOFT
Update typeLow-rank additive deltaBlock-diagonal orthogonal rotation
Preserves anglesNoYes
Parameter structureTwo low-rank matrices per layerOne block-diagonal matrix per layer
ConstraintNone (unconstrained delta)Norm constraint on skew-symmetric matrices
Typical network_alpha1–32Small values like 1e-3
Good forGeneral fine-tuningStyle preservation, conservative adaptation

OFT for SD 1.x/2.x (networks.oft)

networks.oft targets attention layers in the UNet by default (CrossAttention). You can expand coverage to all linear layers in Transformer2DModel blocks or to Conv2d layers. The network_dim argument sets the number of orthogonal blocks (not a rank in the LoRA sense). A larger network_dim gives finer-grained rotation matrices but increases parameter count.
The default network_alpha for OFT is 1e-3, not 1.0. Using network_alpha >= 1 will produce a warning. Set a small value such as 1e-3 or 1e-4.

network_args for networks.oft

enable_all_linear
bool
default:"False"
Expand OFT coverage from attention-only (CrossAttention) to all linear layers inside Transformer2DModel blocks, including feed-forward layers. Increases the number of trained parameters.
enable_conv
bool
default:"False"
Also apply OFT to Conv2d layers in ResnetBlock2D, Downsample2D, and Upsample2D modules. Useful when fine-tuning for textures or styles that are encoded in the ResNet layers.

Training example

accelerate launch --num_cpu_threads_per_process 1 --mixed_precision bf16 train_network.py \
  --pretrained_model_name_or_path path/to/sd15.safetensors \
  --dataset_config path/to/dataset.toml \
  --mixed_precision bf16 \
  --optimizer_type adamw8bit \
  --learning_rate 5e-5 \
  --gradient_checkpointing \
  --network_module networks.oft \
  --network_dim 8 \
  --network_alpha 1e-3 \
  --max_train_epochs 10 \
  --save_every_n_epochs 1 \
  --output_dir path/to/output \
  --output_name my-oft
To also train feed-forward layers and Conv2d layers:
  --network_args "enable_all_linear=True" "enable_conv=True"

OFT for FLUX.1 (networks.oft_flux)

networks.oft_flux targets FLUX.1’s DoubleStreamBlock and SingleStreamBlock modules. Because FLUX combines Q, K, and V into a single projection (qkv), oft_flux handles split dimensions automatically — each sub-projection (Q, K, V) gets its own block-diagonal rotation matrix. The constraint parameter (network_alpha) scales proportionally to the output dimension of each sub-projection rather than the full combined QKV output, which is the primary behavioral difference from networks.oft.

network_args for networks.oft_flux

enable_all_linear
bool
default:"False"
Expand coverage from attention-only (SelfAttention) to all linear layers inside DoubleStreamBlock and SingleStreamBlock, including MLP layers.

Training example

accelerate launch --num_cpu_threads_per_process 1 --mixed_precision bf16 flux_train_network.py \
  --pretrained_model_name_or_path path/to/flux1-dev.safetensors \
  --clip_l path/to/clip_l.safetensors \
  --t5xxl path/to/t5xxl.safetensors \
  --ae path/to/ae.safetensors \
  --dataset_config path/to/dataset.toml \
  --mixed_precision bf16 \
  --optimizer_type adamw8bit \
  --learning_rate 5e-5 \
  --gradient_checkpointing \
  --network_module networks.oft_flux \
  --network_dim 8 \
  --network_alpha 1e-3 \
  --max_train_epochs 10 \
  --save_every_n_epochs 1 \
  --output_dir path/to/output \
  --output_name my-oft-flux
To include MLP layers in FLUX training:
  --network_args "enable_all_linear=True"

OFT internals

OFT learns a block-diagonal skew-symmetric matrix Q per layer. The orthogonal rotation matrix R is computed via the Cayley map:
R = (I + Q)(I - Q)^{-1}
The constraint (set by network_alpha) limits the Frobenius norm of Q to prevent the rotation from deviating too far from the identity. At network_alpha = 0, no constraint is applied. At inference, the rotated weight is:
W' = R × W    (reshaped to num_blocks × block_size)
The network_dim argument controls how many blocks the output dimension is divided into. Each block has its own block_size × block_size rotation matrix. A larger network_dim means more, smaller blocks — each capturing finer-grained rotations.

Build docs developers (and LLMs) love