Skip to main content
Lumina Image 2.0 is a Next-generation Diffusion Transformer (Next-DiT) model. It uses a single Gemma2 language model as its text encoder and a dedicated AutoEncoder, making it architecturally distinct from Stable Diffusion and FLUX.1.

Architecture

  • Next-DiT — Next-generation Diffusion Transformer architecture. Replaces the UNet with a transformer that natively handles variable-resolution inputs.
  • Gemma2 (2B) — single text encoder based on Google’s Gemma2 language model. Handles both prompt encoding and system prompt conditioning.
  • AutoEncoder (AE) — the same AE used by FLUX.1 (ae.safetensors).

Required model files

Download the following files before training:
ComponentFileSource
Lumina Image 2.0 DiTlumina-image-2.safetensors (full precision) or lumina_2_model_bf16.safetensors (bf16)rockerBOO/lumina-image-2 / Comfy-Org/Lumina_Image_2.0_Repackaged
Gemma2 2B text encodergemma_2_2b_fp16.safetensorsComfy-Org/Lumina_Image_2.0_Repackaged
AutoEncoderae.safetensorsComfy-Org/Lumina_Image_2.0_Repackaged
The AutoEncoder for Lumina Image 2.0 is the same file as the FLUX.1 AE. If you already have ae.safetensors from a FLUX.1 setup, you can reuse it here.

Available training methods

MethodScriptNotes
LoRAlumina_train_network.pyUses networks.lora_lumina
Fine-tuninglumina_train.pyFull model training

LoRA training

Use lumina_train_network.py with --network_module=networks.lora_lumina:
accelerate launch --num_cpu_threads_per_process 1 lumina_train_network.py \
  --pretrained_model_name_or_path="lumina-image-2.safetensors" \
  --gemma2="gemma_2_2b_fp16.safetensors" \
  --ae="ae.safetensors" \
  --dataset_config="my_lumina_dataset_config.toml" \
  --output_dir="./output" \
  --output_name="my_lumina_lora" \
  --save_model_as=safetensors \
  --network_module=networks.lora_lumina \
  --network_dim=8 \
  --network_alpha=8 \
  --learning_rate=1e-4 \
  --optimizer_type="AdamW" \
  --lr_scheduler="constant" \
  --timestep_sampling="nextdit_shift" \
  --discrete_flow_shift=6.0 \
  --model_prediction_type="raw" \
  --system_prompt="You are an assistant designed to generate high-quality images based on user prompts." \
  --max_train_epochs=10 \
  --save_every_n_epochs=1 \
  --mixed_precision="bf16" \
  --gradient_checkpointing \
  --cache_latents \
  --cache_text_encoder_outputs

System prompts

Lumina Image 2.0 is conditioned on a system prompt in addition to the image caption. You must provide a system prompt during training to match the model’s pre-training setup.
--system_prompt="You are an assistant designed to generate high-quality images based on user prompts."

Key training parameters

ParameterDescriptionDefaultRecommendation
--network_moduleNetwork modulenetworks.lora_lumina
--timestep_samplingTimestep sampling methodshiftnextdit_shift
--discrete_flow_shiftEuler Discrete Scheduler shift6.06.0
--model_prediction_typePrediction processingrawraw
--mixed_precisionMixed precision dtypebf16
--gemma2_max_token_lengthGemma2 max token length256256
Use --mixed_precision="bf16" for Lumina training. The model was pre-trained in bfloat16 and fp16 can be less stable.

Per-component LoRA rank control

Use --network_args to set different ranks for each model component:
--network_args \
  "attn_dim=8" \
  "mlp_dim=4" \
  "mod_dim=4" \
  "refiner_dim=4" \
  "embedder_dims=[4,4,4]"
The three values in embedder_dims correspond to: x_embedder, t_embedder, and caption_embedder.

Memory optimization

OptionEffect
--blocks_to_swap=<n>Offloads n Transformer blocks to CPU
--cache_text_encoder_outputsCaches Gemma2 outputs
--cache_latents / --cache_latents_to_diskCaches AE outputs
--fp8_baseTrains the base model in FP8 precision
--use_flash_attnEnables Flash Attention (requires pip install flash-attn)
--use_sage_attnEnables Sage Attention

Inference

After training, use lumina_minimal_inference.py to generate images with your LoRA:
python lumina_minimal_inference.py \
  --pretrained_model_name_or_path "lumina-image-2.safetensors" \
  --gemma2_path "gemma_2_2b_fp16.safetensors" \
  --ae_path "ae.safetensors" \
  --output_dir "./outputs" \
  --offload \
  --seed 1234 \
  --prompt "A mountain landscape at sunset" \
  --system_prompt "You are an assistant designed to generate high-quality images based on user prompts." \
  --lora_weights "my_lumina_lora.safetensors;1.0"

Incompatible options

The following arguments are for SD 1.x/2.x and must not be used for Lumina:
  • --v2, --v_parameterization, --clip_skip

Build docs developers (and LLMs) love