Skip to main content
Fine-tuning retrains all (or most) of the weights in a pre-trained model. This gives you the deepest level of control over the model’s behavior, at the cost of higher VRAM requirements and larger output files.

Fine-tuning vs LoRA

Both fine-tuning and LoRA let you teach a model new concepts or styles, but they work very differently:
FeatureFine-tuningLoRA
Training targetAll model weightsAdditional adapter network only
VRAM / compute costHighLow
Training timeLongShort
Output file sizeLarge (several GB)Small (few MB to hundreds of MB)
Overfitting riskHighLow
Best suited forMajor style changes, concept learningAdding specific characters or styles
As a general rule, start with LoRA when you want to add a specific character or style. Fine-tuning is the right choice when you need more fundamental style changes or want to produce a high-quality base model.
Fine-tuning is prone to overfitting. If you train too long, the model can lose the diversity and generalization of the original weights. Monitor your outputs carefully at each checkpoint and stop training before quality degrades.

Two fine-tuning approaches

sd-scripts supports two distinct fine-tuning styles for SD 1.x/2.x:
train_db.py uses the DreamBooth dataset format: a directory of images, each optionally paired with a .txt caption. It supports regularization images to help preserve the model’s prior knowledge.Key arguments unique to train_db.py:
ArgumentDescription
--learning_rate_teSeparate learning rate for the text encoder
--stop_text_encoder_trainingStop text encoder training after N steps (-1 = never train it)
--no_token_paddingDisable token padding (matches Diffusers DreamBooth behavior)
--no_half_vaeUse full float VAE instead of fp16/bf16 VAE in mixed precision

Supported architectures

Each architecture has its own training script. All share a common command structure but expose architecture-specific options.
Trains both the U-Net and, optionally, the two Text Encoders (CLIP ViT-L and OpenCLIP ViT-bigG).VRAM requirement: 24 GB+ recommended. Use --gradient_checkpointing and --cache_latents on lower-VRAM GPUs.
ArgumentDescription
--train_text_encoderInclude both text encoders in training
--learning_rate_te1Per-encoder learning rate for CLIP ViT-L
--learning_rate_te2Per-encoder learning rate for OpenCLIP ViT-bigG
--block_lrSet a different learning rate per U-Net block (23 blocks total) — fine-tuning only
accelerate launch --mixed_precision bf16 sdxl_train.py \
  --pretrained_model_name_or_path "sd_xl_base_1.0.safetensors" \
  --dataset_config "dataset_config.toml" \
  --output_dir "output" \
  --output_name "sdxl_finetuned" \
  --save_model_as safetensors \
  --train_text_encoder \
  --learning_rate 1e-5 \
  --learning_rate_te1 5e-6 \
  --learning_rate_te2 2e-6 \
  --max_train_steps 10000 \
  --optimizer_type AdamW8bit
Trains Stable Diffusion 3 Medium. SD3 uses three Text Encoders (CLIP-L, CLIP-G, T5-XXL) and an MMDiT backbone.VRAM requirement: 24 GB+ recommended. Use --blocks_to_swap to offload MMDiT blocks to CPU on lower-VRAM systems.
ArgumentDescription
--train_text_encoderTrain CLIP-L and CLIP-G
--train_t5xxlTrain T5-XXL (very large; requires significant VRAM)
--blocks_to_swapSwap N MMDiT blocks to CPU to reduce VRAM usage
--num_last_block_to_freezeFreeze the last N MMDiT blocks, focusing training on earlier layers
accelerate launch --mixed_precision bf16 sd3_train.py \
  --pretrained_model_name_or_path "sd3_medium.safetensors" \
  --dataset_config "dataset_config.toml" \
  --output_dir "output" \
  --output_name "sd3_finetuned" \
  --save_model_as safetensors \
  --train_text_encoder \
  --learning_rate 4e-6 \
  --blocks_to_swap 10 \
  --max_train_steps 10000 \
  --optimizer_type AdamW8bit
Trains FLUX.1 models. FLUX.1 uses two types of Transformer blocks internally: Double Blocks and Single Blocks.VRAM requirement: 24 GB+ recommended. Use --blocks_to_swap to reduce memory pressure.
ArgumentDescription
--blocks_to_swapSwap N Transformer blocks to CPU for memory optimization
--blockwise_fused_optimizersExperimental: apply individual optimizers per block for more efficient training
accelerate launch --mixed_precision bf16 flux_train.py \
  --pretrained_model_name_or_path "FLUX.1-dev.safetensors" \
  --dataset_config "dataset_config.toml" \
  --output_dir "output" \
  --output_name "flux1_finetuned" \
  --save_model_as safetensors \
  --learning_rate 1e-5 \
  --blocks_to_swap 18 \
  --max_train_steps 10000 \
  --optimizer_type AdamW8bit
Trains Lumina-Next DiT models. Options largely follow the same patterns as other scripts.VRAM requirement: Varies by model size. Use --gradient_checkpointing as needed.
ArgumentDescription
--use_flash_attnEnable Flash Attention for faster computation
accelerate launch --mixed_precision bf16 lumina_train.py \
  --pretrained_model_name_or_path "Lumina-Next-DiT-B.safetensors" \
  --dataset_config "dataset_config.toml" \
  --output_dir "output" \
  --output_name "lumina_finetuned" \
  --save_model_as safetensors \
  --learning_rate 1e-5 \
  --max_train_steps 10000 \
  --optimizer_type AdamW8bit

Key configuration options

The following options apply across all fine-tuning scripts:
ArgumentDescription
--pretrained_model_name_or_pathPath to the base model (.safetensors, .ckpt, or Diffusers directory)
--dataset_configPath to your dataset TOML configuration file
--output_dirDirectory to save trained model checkpoints
--output_nameBase filename for the output model
--save_model_asSave format: safetensors (recommended) or ckpt
--max_train_stepsTotal number of training steps
--learning_rateBase learning rate (typically 1e-5 to 4e-6 for fine-tuning)
--optimizer_typeOptimizer: AdamW8bit (memory-efficient), AdamW, Lion, etc.
--mixed_precisionUse bf16 or fp16 for lower VRAM usage
--gradient_checkpointingTrade speed for reduced VRAM usage
--cache_latentsPre-encode images with VAE to save VRAM during training

Architecture-specific fine-tuning differences

ArchitectureFine-tuning-only optionsKey difference from LoRA
SDXL--block_lrPer-block U-Net learning rate control is exclusive to fine-tuning
SD3--train_text_encoder, --train_t5xxl, --num_last_block_to_freezeFull Text Encoder training; LoRA only trains adapter parts
FLUX.1--blockwise_fused_optimizersEntire model weights updated; more experimental optimizer options available
Lumina(Few specific options)Core difference is that fine-tuning updates all model weights

Full command example

The following example shows a complete fine-tuning run for SDXL with common memory-saving options enabled:
accelerate launch --mixed_precision bf16 sdxl_train.py \
  --pretrained_model_name_or_path "sd_xl_base_1.0.safetensors" \
  --dataset_config "dataset_config.toml" \
  --output_dir "./output" \
  --output_name "sdxl_finetuned_v1" \
  --save_model_as safetensors \
  --train_text_encoder \
  --learning_rate 1e-5 \
  --learning_rate_te1 5e-6 \
  --learning_rate_te2 2e-6 \
  --max_train_steps 10000 \
  --optimizer_type AdamW8bit \
  --mixed_precision bf16 \
  --gradient_checkpointing \
  --cache_latents \
  --save_every_n_epochs 1

Build docs developers (and LLMs) love