LoRA Training for FLUX.1

Overview

flux_train_network.py trains LoRA adapters for the FLUX.1 model family (dev and schnell variants). FLUX.1 differs fundamentally from Stable Diffusion: it uses a Transformer-based architecture (DiT) instead of a U-Net, requires two separate text encoders (CLIP-L and T5-XXL), and uses a dedicated AutoEncoder (AE) instead of a standard VAE.

This guide assumes you are familiar with basic LoRA training concepts. See LoRA Training Overview and LoRA Training for SD 1.x/2.x for background.

Architecture differences

Feature	SD 1.x/2.x	FLUX.1
Image model	U-Net	Transformer (DiT)
Text encoders	1× CLIP	CLIP-L + T5-XXL
Latent encoder	VAE	AutoEncoder (AE)
Network module	`networks.lora`	`networks.lora_flux`
Model file arg	`--pretrained_model_name_or_path`	`--pretrained_model_name_or_path`
Additional model args	—	`--clip_l`, `--t5xxl`, `--ae`

Required model files

You need four separate model files before training:

FLUX.1 DiT model (flux1-dev.safetensors)

Download from the black-forest-labs/FLUX.1-dev repository on Hugging Face.Use flux1-dev.safetensors. The weights inside the repository subfolders are in Diffusers format and are not compatible with this script.

AutoEncoder (ae.safetensors)

Download ae.safetensors from the same black-forest-labs/FLUX.1-dev repository.

T5-XXL text encoder (t5xxl_fp16.safetensors)

Download from the comfyanonymous/flux_text_encoders repository on Hugging Face.Use t5xxl_fp16.safetensors for the fp16 version.

CLIP-L text encoder (clip_l.safetensors)

Download clip_l.safetensors from the same comfyanonymous/flux_text_encoders repository.

Training command

FLUX.1 dev
Chroma

accelerate launch --num_cpu_threads_per_process 1 flux_train_network.py \
  --pretrained_model_name_or_path="/path/to/flux1-dev.safetensors" \
  --clip_l="/path/to/clip_l.safetensors" \
  --t5xxl="/path/to/t5xxl_fp16.safetensors" \
  --ae="/path/to/ae.safetensors" \
  --dataset_config="my_flux_dataset_config.toml" \
  --output_dir="./output" \
  --output_name="my_flux_lora" \
  --save_model_as=safetensors \
  --network_module=networks.lora_flux \
  --network_dim=16 \
  --network_alpha=1 \
  --learning_rate=1e-4 \
  --optimizer_type="AdamW8bit" \
  --lr_scheduler="constant" \
  --sdpa \
  --max_train_epochs=10 \
  --save_every_n_epochs=1 \
  --mixed_precision="fp16" \
  --gradient_checkpointing \
  --guidance_scale=1.0 \
  --timestep_sampling="flux_shift" \
  --model_prediction_type="raw" \
  --blocks_to_swap=18 \
  --cache_text_encoder_outputs \
  --cache_latents

accelerate launch --num_cpu_threads_per_process 1 flux_train_network.py \
  --pretrained_model_name_or_path="/path/to/Chroma.safetensors" \
  --model_type=chroma \
  --t5xxl="/path/to/t5xxl_fp16.safetensors" \
  --ae="/path/to/ae.safetensors" \
  --dataset_config="my_flux_dataset_config.toml" \
  --output_dir="./output" \
  --output_name="my_chroma_lora" \
  --save_model_as=safetensors \
  --network_module=networks.lora_flux \
  --network_dim=16 \
  --network_alpha=1 \
  --learning_rate=1e-4 \
  --optimizer_type="AdamW8bit" \
  --lr_scheduler="constant" \
  --guidance_scale=0.0 \
  --timestep_sampling="sigmoid" \
  --apply_t5_attn_mask \
  --max_train_epochs=10 \
  --mixed_precision="fp16" \
  --gradient_checkpointing

Download the Chroma model from lodestones/Chroma1-Base. Chroma does not use CLIP-L, so omit --clip_l. AE and T5-XXL are the same files used for FLUX.1.

FLUX.1-specific arguments

Model loading

--pretrained_model_name_or_path

string

required

Path to the FLUX.1 or Chroma .safetensors file. Diffusers-format directories are not supported.

--model_type

string

default:"flux"

Base model type. Use flux for FLUX.1 dev/schnell or chroma for Chroma models.

--clip_l

string

required

Path to the CLIP-L text encoder .safetensors file. Required when --model_type=flux. Omit for Chroma.

--t5xxl

string

required

Path to the T5-XXL text encoder .safetensors file.

--ae

string

required

Path to the FLUX.1-compatible AutoEncoder .safetensors file.

Training behavior

--guidance_scale

number

default:"3.5"

Guidance scale for the distilled FLUX.1 dev model. Set to 1.0 during training to disable embedded guidance. For Chroma, set to 0.0. Usually ignored for the schnell variant.

--timestep_sampling

string

default:"sigma"

Method for sampling timesteps during training. Options: sigma, uniform, sigmoid, shift, flux_shift. Recommended value for FLUX.1 is flux_shift. For Chroma, use sigmoid.

--sigmoid_scale

number

default:"1.0"

Scale factor when --timestep_sampling is sigmoid, shift, or flux_shift.

--model_prediction_type

string

default:"sigma_scaled"

What the model predicts. Options: raw, additive, sigma_scaled. Recommended value is raw.

--discrete_flow_shift

number

default:"3.0"

Scheduler shift value for Flow Matching. Only used when --timestep_sampling=shift.

--apply_t5_attn_mask

boolean

Applies an attention mask to T5-XXL outputs. Required for Chroma models.

Memory optimization

FLUX.1 is a large model. Use the following options to fit training into limited VRAM:

FP8 base model (--fp8_base)

Loads the FLUX.1 DiT, CLIP-L, and T5-XXL in FP8 format to significantly reduce VRAM.

--fp8_base

Requires PyTorch 2.1 or later. Results may differ slightly from full-precision training.

Block swapping (--blocks_to_swap)

Offloads a number of Transformer blocks from GPU to CPU, reducing peak VRAM at the cost of training speed. FLUX.1 supports up to 35 blocks for swapping.

--blocks_to_swap=18

Cannot be used together with --cpu_offload_checkpointing.

CPU offload checkpointing (--cpu_offload_checkpointing)

Offloads gradient checkpoints to CPU. Saves up to 1 GB of VRAM but reduces training speed by around 15%. Cannot be used with --blocks_to_swap. Not supported for Chroma models.

--cpu_offload_checkpointing

Caching text encoder and latent outputs

Pre-compute and cache text encoder and AE outputs to skip those passes during each training step.

--cache_text_encoder_outputs \
--cache_latents

Add --cache_text_encoder_outputs_to_disk and --cache_latents_to_disk to persist the cache between runs.

Recommended settings by GPU VRAM

GPU VRAM	Recommended settings
24 GB	Basic settings, batch size 2
16 GB	Batch size 1, `--blocks_to_swap=8`
12 GB	`--blocks_to_swap=16`, 8-bit AdamW
10 GB	`--blocks_to_swap=22`, fp8 T5-XXL recommended
8 GB	`--blocks_to_swap=28`, fp8 T5-XXL recommended

Incompatible arguments

The following SD-specific flags have no effect in FLUX.1 training and should be omitted:

--v2, --v_parameterization
--clip_skip
--max_token_length (use --t5xxl_max_token_length instead)
--split_mode (deprecated; use --blocks_to_swap)

Using the trained LoRA

After training, load the .safetensors file in an inference environment that supports FLUX.1, such as ComfyUI with the FLUX nodes, or any tool that supports the FLUX architecture.

Getting Started

Dataset Preparation

LoRA Training

Fine-tuning & Other Methods

Inference & Utilities

LoRA Training for FLUX.1

Overview

Architecture differences

Required model files

Training command

FLUX.1-specific arguments

Model loading

Training behavior

Memory optimization

Recommended settings by GPU VRAM

Incompatible arguments

Using the trained LoRA

Build docs developers (and LLMs) love

Getting Started

Dataset Preparation

LoRA Training

Fine-tuning & Other Methods

Inference & Utilities

​Overview

​Architecture differences

​Required model files

​Training command

​FLUX.1-specific arguments

​Model loading

​Training behavior

​Memory optimization

​Recommended settings by GPU VRAM

​Incompatible arguments

​Using the trained LoRA

Build docs developers (and LLMs) love

Overview

Architecture differences

Required model files

Training command

FLUX.1-specific arguments

Model loading

Training behavior

Memory optimization

Recommended settings by GPU VRAM

Incompatible arguments

Using the trained LoRA