OFT Network Modules

OFT (Orthogonal Fine-Tuning) fine-tunes a model by learning orthogonal rotation matrices rather than low-rank additive adapters. Where LoRA adds a low-rank delta to each weight matrix, OFT rotates the weight matrix using a block-diagonal orthogonal matrix. This preserves the pairwise angles between weight vectors, making OFT a more conservative fine-tuning approach.

Because OFT preserves hyperspherical energy (the pairwise angular relationships between weight vectors), it tends to retain the base model’s style and composition while adapting its content. This makes OFT well-suited for style preservation tasks where LoRA might introduce unwanted drift.

Available modules

Module	File	Target architecture
`networks.oft`	`networks/oft.py`	SD 1.x / 2.x (and SDXL)
`networks.oft_flux`	`networks/oft_flux.py`	FLUX.1

How OFT differs from LoRA

Property	LoRA	OFT
Update type	Low-rank additive delta	Block-diagonal orthogonal rotation
Preserves angles	No	Yes
Parameter structure	Two low-rank matrices per layer	One block-diagonal matrix per layer
Constraint	None (unconstrained delta)	Norm constraint on skew-symmetric matrices
Typical `network_alpha`	1–32	Small values like `1e-3`
Good for	General fine-tuning	Style preservation, conservative adaptation

OFT for SD 1.x/2.x (`networks.oft`)

networks.oft targets attention layers in the UNet by default (CrossAttention). You can expand coverage to all linear layers in Transformer2DModel blocks or to Conv2d layers. The network_dim argument sets the number of orthogonal blocks (not a rank in the LoRA sense). A larger network_dim gives finer-grained rotation matrices but increases parameter count.

The default network_alpha for OFT is 1e-3, not 1.0. Using network_alpha >= 1 will produce a warning. Set a small value such as 1e-3 or 1e-4.

network_args for `networks.oft`

enable_all_linear

bool

default:"False"

Expand OFT coverage from attention-only (CrossAttention) to all linear layers inside Transformer2DModel blocks, including feed-forward layers. Increases the number of trained parameters.

enable_conv

bool

default:"False"

Also apply OFT to Conv2d layers in ResnetBlock2D, Downsample2D, and Upsample2D modules. Useful when fine-tuning for textures or styles that are encoded in the ResNet layers.

Training example

accelerate launch --num_cpu_threads_per_process 1 --mixed_precision bf16 train_network.py \
  --pretrained_model_name_or_path path/to/sd15.safetensors \
  --dataset_config path/to/dataset.toml \
  --mixed_precision bf16 \
  --optimizer_type adamw8bit \
  --learning_rate 5e-5 \
  --gradient_checkpointing \
  --network_module networks.oft \
  --network_dim 8 \
  --network_alpha 1e-3 \
  --max_train_epochs 10 \
  --save_every_n_epochs 1 \
  --output_dir path/to/output \
  --output_name my-oft

To also train feed-forward layers and Conv2d layers:

  --network_args "enable_all_linear=True" "enable_conv=True"

OFT for FLUX.1 (`networks.oft_flux`)

networks.oft_flux targets FLUX.1’s DoubleStreamBlock and SingleStreamBlock modules. Because FLUX combines Q, K, and V into a single projection (qkv), oft_flux handles split dimensions automatically — each sub-projection (Q, K, V) gets its own block-diagonal rotation matrix. The constraint parameter (network_alpha) scales proportionally to the output dimension of each sub-projection rather than the full combined QKV output, which is the primary behavioral difference from networks.oft.

network_args for `networks.oft_flux`

enable_all_linear

bool

default:"False"

Expand coverage from attention-only (SelfAttention) to all linear layers inside DoubleStreamBlock and SingleStreamBlock, including MLP layers.

Training example

accelerate launch --num_cpu_threads_per_process 1 --mixed_precision bf16 flux_train_network.py \
  --pretrained_model_name_or_path path/to/flux1-dev.safetensors \
  --clip_l path/to/clip_l.safetensors \
  --t5xxl path/to/t5xxl.safetensors \
  --ae path/to/ae.safetensors \
  --dataset_config path/to/dataset.toml \
  --mixed_precision bf16 \
  --optimizer_type adamw8bit \
  --learning_rate 5e-5 \
  --gradient_checkpointing \
  --network_module networks.oft_flux \
  --network_dim 8 \
  --network_alpha 1e-3 \
  --max_train_epochs 10 \
  --save_every_n_epochs 1 \
  --output_dir path/to/output \
  --output_name my-oft-flux

To include MLP layers in FLUX training:

  --network_args "enable_all_linear=True"

OFT internals

OFT learns a block-diagonal skew-symmetric matrix Q per layer. The orthogonal rotation matrix R is computed via the Cayley map:

R = (I + Q)(I - Q)^{-1}

The constraint (set by network_alpha) limits the Frobenius norm of Q to prevent the rotation from deviating too far from the identity. At network_alpha = 0, no constraint is applied. At inference, the rotated weight is:

W' = R × W    (reshaped to num_blocks × block_size)

The network_dim argument controls how many blocks the output dimension is divided into. Each block has its own block_size × block_size rotation matrix. A larger network_dim means more, smaller blocks — each capturing finer-grained rotations.

Supported Models

Network Modules

Available modules

How OFT differs from LoRA

OFT for SD 1.x/2.x (`networks.oft`)

network_args for `networks.oft`

Training example

OFT for FLUX.1 (`networks.oft_flux`)

network_args for `networks.oft_flux`

Training example

OFT internals

Build docs developers (and LLMs) love

Supported Models

Network Modules

​Available modules

​How OFT differs from LoRA

​OFT for SD 1.x/2.x (networks.oft)

​network_args for networks.oft

​Training example

​OFT for FLUX.1 (networks.oft_flux)

​network_args for networks.oft_flux

​Training example

​OFT internals

Build docs developers (and LLMs) love

Available modules

How OFT differs from LoRA

OFT for SD 1.x/2.x (`networks.oft`)

network_args for `networks.oft`

Training example

OFT for FLUX.1 (`networks.oft_flux`)

network_args for `networks.oft_flux`

Training example

OFT internals