Skip to main content
FLUX.1 is a Transformer-based image generation model from Black Forest Labs. Unlike Stable Diffusion, FLUX.1 uses a Diffusion Transformer (DiT) architecture with two text encoders and a dedicated AutoEncoder rather than a VAE.

Architecture

FLUX.1 departs from the UNet-based pipeline used in SD 1.x/2.x and SDXL:
  • DiT (Diffusion Transformer) — replaces the UNet. Operates on patchified latent representations using bidirectional attention.
  • Dual text encoders
    • CLIP-L — fast encoder for short-to-medium prompts.
    • T5-XXL — large language model encoder for long, complex prompts (up to 512 tokens by default).
  • AutoEncoder (AE) — encodes and decodes between pixel and latent space. Not VAE-compatible with SD 1.x/2.x or SDXL.

Versions

VersionGuidanceUse case
FLUX.1-devDistilled guidance embeddingGeneral-purpose, recommended for training
FLUX.1-schnellFlow matching, fewer stepsFast inference, fewer training steps needed
FLUX.1-dev is distilled with specific guidance scale values. During training, set --guidance_scale=1.0 to disable the guidance scale. The default value (3.5) is for inference, not training.

Required model files

Download the following files before training:
ComponentFileSource
DiTflux1-dev.safetensorsblack-forest-labs/FLUX.1-dev
AEae.safetensorsblack-forest-labs/FLUX.1-dev
T5-XXLt5xxl_fp16.safetensorscomfyanonymous/flux_text_encoders
CLIP-Lclip_l.safetensorscomfyanonymous/flux_text_encoders
Do not use the weights from the Diffusers-format subfolder inside the FLUX.1-dev repository. These are in Diffusers format and cannot be used directly. Use the top-level flux1-dev.safetensors and ae.safetensors files.

Available training methods

MethodScriptNotes
LoRAflux_train_network.pyPrimary training method
Fine-tuningflux_train.pyFull model training
ControlNetflux_train_control_net.pyControlNet training

LoRA training

Use flux_train_network.py with --network_module=networks.lora_flux:
accelerate launch --num_cpu_threads_per_process 1 flux_train_network.py \
  --pretrained_model_name_or_path="flux1-dev.safetensors" \
  --clip_l="clip_l.safetensors" \
  --t5xxl="t5xxl_fp16.safetensors" \
  --ae="ae.safetensors" \
  --dataset_config="my_flux_dataset_config.toml" \
  --output_dir="./output" \
  --output_name="my_flux_lora" \
  --save_model_as=safetensors \
  --network_module=networks.lora_flux \
  --network_dim=16 \
  --network_alpha=1 \
  --learning_rate=1e-4 \
  --optimizer_type="AdamW8bit" \
  --lr_scheduler="constant" \
  --sdpa \
  --max_train_epochs=10 \
  --save_every_n_epochs=1 \
  --mixed_precision="fp16" \
  --gradient_checkpointing \
  --guidance_scale=1.0 \
  --timestep_sampling="flux_shift" \
  --model_prediction_type="raw" \
  --blocks_to_swap=18 \
  --cache_text_encoder_outputs \
  --cache_latents
--timestep_sampling="flux_shift" and --model_prediction_type="raw" are the recommended settings for FLUX.1-dev LoRA training.

Memory optimization

FLUX.1 is a large model. Use these options to reduce VRAM usage:
GPU VRAMRecommended settings
24 GBStandard settings (batch size 2)
16 GBBatch size 1 + --blocks_to_swap
12 GB--blocks_to_swap 16 + 8-bit AdamW
10 GB--blocks_to_swap 22 + fp8 T5-XXL
8 GB--blocks_to_swap 28 + fp8 T5-XXL

Key memory options

  • --fp8_base — trains FLUX.1, CLIP-L, and T5-XXL in FP8 format. Significantly reduces VRAM at a potential quality cost.
  • --blocks_to_swap <n> — offloads n Transformer blocks to CPU. FLUX.1 supports up to 35 blocks. Cannot be combined with --cpu_offload_checkpointing.
  • --cache_text_encoder_outputs — caches CLIP-L and T5-XXL outputs; reduces memory usage but disables text encoder LoRA training.
  • --cache_latents / --cache_latents_to_disk — caches AE outputs.

Key training parameters

ParameterDescriptionRecommendation
--network_moduleNetwork modulenetworks.lora_flux
--network_dimLoRA rank16
--guidance_scaleGuidance scale during training1.0 for dev
--timestep_samplingTimestep sampling methodflux_shift
--model_prediction_typePrediction processingraw
--t5xxl_max_token_lengthT5-XXL max tokens512 (default)

Incompatible options

The following SD 1.x/2.x arguments are not used for FLUX.1 training and should not be specified:
  • --v2, --v_parameterization, --clip_skip
  • --max_token_length (use --t5xxl_max_token_length instead)
  • --split_mode (deprecated; use --blocks_to_swap)

Build docs developers (and LLMs) love