Overview
flux_train_network.py trains LoRA adapters for the FLUX.1 model family (dev and schnell variants). FLUX.1 differs fundamentally from Stable Diffusion: it uses a Transformer-based architecture (DiT) instead of a U-Net, requires two separate text encoders (CLIP-L and T5-XXL), and uses a dedicated AutoEncoder (AE) instead of a standard VAE.
This guide assumes you are familiar with basic LoRA training concepts. See LoRA Training Overview and LoRA Training for SD 1.x/2.x for background.
Architecture differences
| Feature | SD 1.x/2.x | FLUX.1 |
|---|---|---|
| Image model | U-Net | Transformer (DiT) |
| Text encoders | 1× CLIP | CLIP-L + T5-XXL |
| Latent encoder | VAE | AutoEncoder (AE) |
| Network module | networks.lora | networks.lora_flux |
| Model file arg | --pretrained_model_name_or_path | --pretrained_model_name_or_path |
| Additional model args | — | --clip_l, --t5xxl, --ae |
Required model files
You need four separate model files before training:FLUX.1 DiT model (flux1-dev.safetensors)
FLUX.1 DiT model (flux1-dev.safetensors)
Download from the black-forest-labs/FLUX.1-dev repository on Hugging Face.Use
flux1-dev.safetensors. The weights inside the repository subfolders are in Diffusers format and are not compatible with this script.AutoEncoder (ae.safetensors)
AutoEncoder (ae.safetensors)
Download
ae.safetensors from the same black-forest-labs/FLUX.1-dev repository.T5-XXL text encoder (t5xxl_fp16.safetensors)
T5-XXL text encoder (t5xxl_fp16.safetensors)
Download from the comfyanonymous/flux_text_encoders repository on Hugging Face.Use
t5xxl_fp16.safetensors for the fp16 version.CLIP-L text encoder (clip_l.safetensors)
CLIP-L text encoder (clip_l.safetensors)
Download
clip_l.safetensors from the same comfyanonymous/flux_text_encoders repository.Training command
- FLUX.1 dev
- Chroma
FLUX.1-specific arguments
Model loading
Path to the FLUX.1 or Chroma
.safetensors file. Diffusers-format directories are not supported.Base model type. Use
flux for FLUX.1 dev/schnell or chroma for Chroma models.Path to the CLIP-L text encoder
.safetensors file. Required when --model_type=flux. Omit for Chroma.Path to the T5-XXL text encoder
.safetensors file.Path to the FLUX.1-compatible AutoEncoder
.safetensors file.Training behavior
Guidance scale for the distilled FLUX.1 dev model. Set to
1.0 during training to disable embedded guidance. For Chroma, set to 0.0. Usually ignored for the schnell variant.Method for sampling timesteps during training. Options:
sigma, uniform, sigmoid, shift, flux_shift. Recommended value for FLUX.1 is flux_shift. For Chroma, use sigmoid.Scale factor when
--timestep_sampling is sigmoid, shift, or flux_shift.What the model predicts. Options:
raw, additive, sigma_scaled. Recommended value is raw.Scheduler shift value for Flow Matching. Only used when
--timestep_sampling=shift.Applies an attention mask to T5-XXL outputs. Required for Chroma models.
Memory optimization
FLUX.1 is a large model. Use the following options to fit training into limited VRAM:FP8 base model (--fp8_base)
FP8 base model (--fp8_base)
Loads the FLUX.1 DiT, CLIP-L, and T5-XXL in FP8 format to significantly reduce VRAM.Requires PyTorch 2.1 or later. Results may differ slightly from full-precision training.
Block swapping (--blocks_to_swap)
Block swapping (--blocks_to_swap)
Offloads a number of Transformer blocks from GPU to CPU, reducing peak VRAM at the cost of training speed. FLUX.1 supports up to 35 blocks for swapping.Cannot be used together with
--cpu_offload_checkpointing.CPU offload checkpointing (--cpu_offload_checkpointing)
CPU offload checkpointing (--cpu_offload_checkpointing)
Offloads gradient checkpoints to CPU. Saves up to 1 GB of VRAM but reduces training speed by around 15%. Cannot be used with
--blocks_to_swap. Not supported for Chroma models.Caching text encoder and latent outputs
Caching text encoder and latent outputs
Pre-compute and cache text encoder and AE outputs to skip those passes during each training step.Add
--cache_text_encoder_outputs_to_disk and --cache_latents_to_disk to persist the cache between runs.Recommended settings by GPU VRAM
| GPU VRAM | Recommended settings |
|---|---|
| 24 GB | Basic settings, batch size 2 |
| 16 GB | Batch size 1, --blocks_to_swap=8 |
| 12 GB | --blocks_to_swap=16, 8-bit AdamW |
| 10 GB | --blocks_to_swap=22, fp8 T5-XXL recommended |
| 8 GB | --blocks_to_swap=28, fp8 T5-XXL recommended |
Incompatible arguments
The following SD-specific flags have no effect in FLUX.1 training and should be omitted:--v2,--v_parameterization--clip_skip--max_token_length(use--t5xxl_max_token_lengthinstead)--split_mode(deprecated; use--blocks_to_swap)
Using the trained LoRA
After training, load the.safetensors file in an inference environment that supports FLUX.1, such as ComfyUI with the FLUX nodes, or any tool that supports the FLUX architecture.