FLUX.1

FLUX.1 is a Transformer-based image generation model from Black Forest Labs. Unlike Stable Diffusion, FLUX.1 uses a Diffusion Transformer (DiT) architecture with two text encoders and a dedicated AutoEncoder rather than a VAE.

Architecture

FLUX.1 departs from the UNet-based pipeline used in SD 1.x/2.x and SDXL:

DiT (Diffusion Transformer) — replaces the UNet. Operates on patchified latent representations using bidirectional attention.
Dual text encoders
- CLIP-L — fast encoder for short-to-medium prompts.
- T5-XXL — large language model encoder for long, complex prompts (up to 512 tokens by default).
AutoEncoder (AE) — encodes and decodes between pixel and latent space. Not VAE-compatible with SD 1.x/2.x or SDXL.

Versions

Version	Guidance	Use case
FLUX.1-dev	Distilled guidance embedding	General-purpose, recommended for training
FLUX.1-schnell	Flow matching, fewer steps	Fast inference, fewer training steps needed

FLUX.1-dev is distilled with specific guidance scale values. During training, set --guidance_scale=1.0 to disable the guidance scale. The default value (3.5) is for inference, not training.

Required model files

Download the following files before training:

Component	File	Source
DiT	`flux1-dev.safetensors`	black-forest-labs/FLUX.1-dev
AE	`ae.safetensors`	black-forest-labs/FLUX.1-dev
T5-XXL	`t5xxl_fp16.safetensors`	comfyanonymous/flux_text_encoders
CLIP-L	`clip_l.safetensors`	comfyanonymous/flux_text_encoders

Do not use the weights from the Diffusers-format subfolder inside the FLUX.1-dev repository. These are in Diffusers format and cannot be used directly. Use the top-level flux1-dev.safetensors and ae.safetensors files.

Available training methods

Method	Script	Notes
LoRA	`flux_train_network.py`	Primary training method
Fine-tuning	`flux_train.py`	Full model training
ControlNet	`flux_train_control_net.py`	ControlNet training

LoRA training

Use flux_train_network.py with --network_module=networks.lora_flux:

accelerate launch --num_cpu_threads_per_process 1 flux_train_network.py \
  --pretrained_model_name_or_path="flux1-dev.safetensors" \
  --clip_l="clip_l.safetensors" \
  --t5xxl="t5xxl_fp16.safetensors" \
  --ae="ae.safetensors" \
  --dataset_config="my_flux_dataset_config.toml" \
  --output_dir="./output" \
  --output_name="my_flux_lora" \
  --save_model_as=safetensors \
  --network_module=networks.lora_flux \
  --network_dim=16 \
  --network_alpha=1 \
  --learning_rate=1e-4 \
  --optimizer_type="AdamW8bit" \
  --lr_scheduler="constant" \
  --sdpa \
  --max_train_epochs=10 \
  --save_every_n_epochs=1 \
  --mixed_precision="fp16" \
  --gradient_checkpointing \
  --guidance_scale=1.0 \
  --timestep_sampling="flux_shift" \
  --model_prediction_type="raw" \
  --blocks_to_swap=18 \
  --cache_text_encoder_outputs \
  --cache_latents

--timestep_sampling="flux_shift" and --model_prediction_type="raw" are the recommended settings for FLUX.1-dev LoRA training.

Memory optimization

FLUX.1 is a large model. Use these options to reduce VRAM usage:

GPU VRAM	Recommended settings
24 GB	Standard settings (batch size 2)
16 GB	Batch size 1 + `--blocks_to_swap`
12 GB	`--blocks_to_swap 16` + 8-bit AdamW
10 GB	`--blocks_to_swap 22` + fp8 T5-XXL
8 GB	`--blocks_to_swap 28` + fp8 T5-XXL

Key memory options

--fp8_base — trains FLUX.1, CLIP-L, and T5-XXL in FP8 format. Significantly reduces VRAM at a potential quality cost.
--blocks_to_swap <n> — offloads n Transformer blocks to CPU. FLUX.1 supports up to 35 blocks. Cannot be combined with --cpu_offload_checkpointing.
--cache_text_encoder_outputs — caches CLIP-L and T5-XXL outputs; reduces memory usage but disables text encoder LoRA training.
--cache_latents / --cache_latents_to_disk — caches AE outputs.

Key training parameters

Parameter	Description	Recommendation
`--network_module`	Network module	`networks.lora_flux`
`--network_dim`	LoRA rank	16
`--guidance_scale`	Guidance scale during training	`1.0` for dev
`--timestep_sampling`	Timestep sampling method	`flux_shift`
`--model_prediction_type`	Prediction processing	`raw`
`--t5xxl_max_token_length`	T5-XXL max tokens	512 (default)

Incompatible options

The following SD 1.x/2.x arguments are not used for FLUX.1 training and should not be specified:

--v2, --v_parameterization, --clip_skip
--max_token_length (use --t5xxl_max_token_length instead)
--split_mode (deprecated; use --blocks_to_swap)

Supported Models

Network Modules

Architecture

Versions

Required model files

Available training methods

LoRA training

Memory optimization

Key memory options

Key training parameters

Incompatible options

Build docs developers (and LLMs) love

Supported Models

Network Modules

​Architecture

​Versions

​Required model files

​Available training methods

​LoRA training

​Memory optimization

​Key memory options

​Key training parameters

​Incompatible options

Build docs developers (and LLMs) love

Architecture

Versions

Required model files

Available training methods

LoRA training

Memory optimization

Key memory options

Key training parameters

Incompatible options