Fine-tuning

Fine-tuning retrains all (or most) of the weights in a pre-trained model. This gives you the deepest level of control over the model’s behavior, at the cost of higher VRAM requirements and larger output files.

Fine-tuning vs LoRA

Both fine-tuning and LoRA let you teach a model new concepts or styles, but they work very differently:

Feature	Fine-tuning	LoRA
Training target	All model weights	Additional adapter network only
VRAM / compute cost	High	Low
Training time	Long	Short
Output file size	Large (several GB)	Small (few MB to hundreds of MB)
Overfitting risk	High	Low
Best suited for	Major style changes, concept learning	Adding specific characters or styles

As a general rule, start with LoRA when you want to add a specific character or style. Fine-tuning is the right choice when you need more fundamental style changes or want to produce a high-quality base model.

Fine-tuning is prone to overfitting. If you train too long, the model can lose the diversity and generalization of the original weights. Monitor your outputs carefully at each checkpoint and stop training before quality degrades.

Two fine-tuning approaches

sd-scripts supports two distinct fine-tuning styles for SD 1.x/2.x:

DreamBooth-style (train_db.py)
Native fine-tuning (fine_tune.py)

train_db.py uses the DreamBooth dataset format: a directory of images, each optionally paired with a .txt caption. It supports regularization images to help preserve the model’s prior knowledge.Key arguments unique to train_db.py:

Argument	Description
`--learning_rate_te`	Separate learning rate for the text encoder
`--stop_text_encoder_training`	Stop text encoder training after N steps (`-1` = never train it)
`--no_token_padding`	Disable token padding (matches Diffusers DreamBooth behavior)
`--no_half_vae`	Use full float VAE instead of fp16/bf16 VAE in mixed precision

fine_tune.py uses a metadata JSON (in_json) alongside a flat image directory. This approach is better suited to large datasets with pre-generated captions.Key arguments unique to fine_tune.py:

Argument	Description
`--train_text_encoder`	Include text encoder weights in training
`--learning_rate_te`	Separate learning rate for the text encoder
`--diffusers_xformers`	Enable xformers via the Diffusers library
`--no_half_vae`	Use full float VAE instead of fp16/bf16 VAE

Supported architectures

Each architecture has its own training script. All share a common command structure but expose architecture-specific options.

SDXL — sdxl_train.py

Trains both the U-Net and, optionally, the two Text Encoders (CLIP ViT-L and OpenCLIP ViT-bigG).VRAM requirement: 24 GB+ recommended. Use --gradient_checkpointing and --cache_latents on lower-VRAM GPUs.

Argument	Description
`--train_text_encoder`	Include both text encoders in training
`--learning_rate_te1`	Per-encoder learning rate for CLIP ViT-L
`--learning_rate_te2`	Per-encoder learning rate for OpenCLIP ViT-bigG
`--block_lr`	Set a different learning rate per U-Net block (23 blocks total) — fine-tuning only

accelerate launch --mixed_precision bf16 sdxl_train.py \
  --pretrained_model_name_or_path "sd_xl_base_1.0.safetensors" \
  --dataset_config "dataset_config.toml" \
  --output_dir "output" \
  --output_name "sdxl_finetuned" \
  --save_model_as safetensors \
  --train_text_encoder \
  --learning_rate 1e-5 \
  --learning_rate_te1 5e-6 \
  --learning_rate_te2 2e-6 \
  --max_train_steps 10000 \
  --optimizer_type AdamW8bit

SD3 — sd3_train.py

Trains Stable Diffusion 3 Medium. SD3 uses three Text Encoders (CLIP-L, CLIP-G, T5-XXL) and an MMDiT backbone.VRAM requirement: 24 GB+ recommended. Use --blocks_to_swap to offload MMDiT blocks to CPU on lower-VRAM systems.

Argument	Description
`--train_text_encoder`	Train CLIP-L and CLIP-G
`--train_t5xxl`	Train T5-XXL (very large; requires significant VRAM)
`--blocks_to_swap`	Swap N MMDiT blocks to CPU to reduce VRAM usage
`--num_last_block_to_freeze`	Freeze the last N MMDiT blocks, focusing training on earlier layers

accelerate launch --mixed_precision bf16 sd3_train.py \
  --pretrained_model_name_or_path "sd3_medium.safetensors" \
  --dataset_config "dataset_config.toml" \
  --output_dir "output" \
  --output_name "sd3_finetuned" \
  --save_model_as safetensors \
  --train_text_encoder \
  --learning_rate 4e-6 \
  --blocks_to_swap 10 \
  --max_train_steps 10000 \
  --optimizer_type AdamW8bit

FLUX.1 — flux_train.py

Trains FLUX.1 models. FLUX.1 uses two types of Transformer blocks internally: Double Blocks and Single Blocks.VRAM requirement: 24 GB+ recommended. Use --blocks_to_swap to reduce memory pressure.

Argument	Description
`--blocks_to_swap`	Swap N Transformer blocks to CPU for memory optimization
`--blockwise_fused_optimizers`	Experimental: apply individual optimizers per block for more efficient training

accelerate launch --mixed_precision bf16 flux_train.py \
  --pretrained_model_name_or_path "FLUX.1-dev.safetensors" \
  --dataset_config "dataset_config.toml" \
  --output_dir "output" \
  --output_name "flux1_finetuned" \
  --save_model_as safetensors \
  --learning_rate 1e-5 \
  --blocks_to_swap 18 \
  --max_train_steps 10000 \
  --optimizer_type AdamW8bit

Lumina — lumina_train.py

Trains Lumina-Next DiT models. Options largely follow the same patterns as other scripts.VRAM requirement: Varies by model size. Use --gradient_checkpointing as needed.

Argument	Description
`--use_flash_attn`	Enable Flash Attention for faster computation

accelerate launch --mixed_precision bf16 lumina_train.py \
  --pretrained_model_name_or_path "Lumina-Next-DiT-B.safetensors" \
  --dataset_config "dataset_config.toml" \
  --output_dir "output" \
  --output_name "lumina_finetuned" \
  --save_model_as safetensors \
  --learning_rate 1e-5 \
  --max_train_steps 10000 \
  --optimizer_type AdamW8bit

Key configuration options

The following options apply across all fine-tuning scripts:

Common arguments

Argument	Description
`--pretrained_model_name_or_path`	Path to the base model (`.safetensors`, `.ckpt`, or Diffusers directory)
`--dataset_config`	Path to your dataset TOML configuration file
`--output_dir`	Directory to save trained model checkpoints
`--output_name`	Base filename for the output model
`--save_model_as`	Save format: `safetensors` (recommended) or `ckpt`
`--max_train_steps`	Total number of training steps
`--learning_rate`	Base learning rate (typically `1e-5` to `4e-6` for fine-tuning)
`--optimizer_type`	Optimizer: `AdamW8bit` (memory-efficient), `AdamW`, `Lion`, etc.
`--mixed_precision`	Use `bf16` or `fp16` for lower VRAM usage
`--gradient_checkpointing`	Trade speed for reduced VRAM usage
`--cache_latents`	Pre-encode images with VAE to save VRAM during training

Architecture-specific fine-tuning differences

Architecture	Fine-tuning-only options	Key difference from LoRA
SDXL	`--block_lr`	Per-block U-Net learning rate control is exclusive to fine-tuning
SD3	`--train_text_encoder`, `--train_t5xxl`, `--num_last_block_to_freeze`	Full Text Encoder training; LoRA only trains adapter parts
FLUX.1	`--blockwise_fused_optimizers`	Entire model weights updated; more experimental optimizer options available
Lumina	(Few specific options)	Core difference is that fine-tuning updates all model weights

Full command example

The following example shows a complete fine-tuning run for SDXL with common memory-saving options enabled:

accelerate launch --mixed_precision bf16 sdxl_train.py \
  --pretrained_model_name_or_path "sd_xl_base_1.0.safetensors" \
  --dataset_config "dataset_config.toml" \
  --output_dir "./output" \
  --output_name "sdxl_finetuned_v1" \
  --save_model_as safetensors \
  --train_text_encoder \
  --learning_rate 1e-5 \
  --learning_rate_te1 5e-6 \
  --learning_rate_te2 2e-6 \
  --max_train_steps 10000 \
  --optimizer_type AdamW8bit \
  --mixed_precision bf16 \
  --gradient_checkpointing \
  --cache_latents \
  --save_every_n_epochs 1

Getting Started

Dataset Preparation

LoRA Training

Fine-tuning & Other Methods

Inference & Utilities

Fine-tuning vs LoRA

Two fine-tuning approaches

Supported architectures

Key configuration options

Architecture-specific fine-tuning differences

Full command example

Build docs developers (and LLMs) love

Getting Started

Dataset Preparation

LoRA Training

Fine-tuning & Other Methods

Inference & Utilities

​Fine-tuning vs LoRA

​Two fine-tuning approaches

​Supported architectures

​Key configuration options

​Architecture-specific fine-tuning differences

​Full command example

Build docs developers (and LLMs) love

Fine-tuning vs LoRA

Two fine-tuning approaches

Supported architectures

Key configuration options

Architecture-specific fine-tuning differences

Full command example