Fine-tuning vs LoRA
Both fine-tuning and LoRA let you teach a model new concepts or styles, but they work very differently:| Feature | Fine-tuning | LoRA |
|---|---|---|
| Training target | All model weights | Additional adapter network only |
| VRAM / compute cost | High | Low |
| Training time | Long | Short |
| Output file size | Large (several GB) | Small (few MB to hundreds of MB) |
| Overfitting risk | High | Low |
| Best suited for | Major style changes, concept learning | Adding specific characters or styles |
Two fine-tuning approaches
sd-scripts supports two distinct fine-tuning styles for SD 1.x/2.x:- DreamBooth-style (train_db.py)
- Native fine-tuning (fine_tune.py)
train_db.py uses the DreamBooth dataset format: a directory of images, each optionally paired with a .txt caption. It supports regularization images to help preserve the model’s prior knowledge.Key arguments unique to train_db.py:| Argument | Description |
|---|---|
--learning_rate_te | Separate learning rate for the text encoder |
--stop_text_encoder_training | Stop text encoder training after N steps (-1 = never train it) |
--no_token_padding | Disable token padding (matches Diffusers DreamBooth behavior) |
--no_half_vae | Use full float VAE instead of fp16/bf16 VAE in mixed precision |
Supported architectures
Each architecture has its own training script. All share a common command structure but expose architecture-specific options.SDXL — sdxl_train.py
SDXL — sdxl_train.py
Trains both the U-Net and, optionally, the two Text Encoders (CLIP ViT-L and OpenCLIP ViT-bigG).VRAM requirement: 24 GB+ recommended. Use
--gradient_checkpointing and --cache_latents on lower-VRAM GPUs.| Argument | Description |
|---|---|
--train_text_encoder | Include both text encoders in training |
--learning_rate_te1 | Per-encoder learning rate for CLIP ViT-L |
--learning_rate_te2 | Per-encoder learning rate for OpenCLIP ViT-bigG |
--block_lr | Set a different learning rate per U-Net block (23 blocks total) — fine-tuning only |
SD3 — sd3_train.py
SD3 — sd3_train.py
Trains Stable Diffusion 3 Medium. SD3 uses three Text Encoders (CLIP-L, CLIP-G, T5-XXL) and an MMDiT backbone.VRAM requirement: 24 GB+ recommended. Use
--blocks_to_swap to offload MMDiT blocks to CPU on lower-VRAM systems.| Argument | Description |
|---|---|
--train_text_encoder | Train CLIP-L and CLIP-G |
--train_t5xxl | Train T5-XXL (very large; requires significant VRAM) |
--blocks_to_swap | Swap N MMDiT blocks to CPU to reduce VRAM usage |
--num_last_block_to_freeze | Freeze the last N MMDiT blocks, focusing training on earlier layers |
FLUX.1 — flux_train.py
FLUX.1 — flux_train.py
Trains FLUX.1 models. FLUX.1 uses two types of Transformer blocks internally: Double Blocks and Single Blocks.VRAM requirement: 24 GB+ recommended. Use
--blocks_to_swap to reduce memory pressure.| Argument | Description |
|---|---|
--blocks_to_swap | Swap N Transformer blocks to CPU for memory optimization |
--blockwise_fused_optimizers | Experimental: apply individual optimizers per block for more efficient training |
Lumina — lumina_train.py
Lumina — lumina_train.py
Trains Lumina-Next DiT models. Options largely follow the same patterns as other scripts.VRAM requirement: Varies by model size. Use
--gradient_checkpointing as needed.| Argument | Description |
|---|---|
--use_flash_attn | Enable Flash Attention for faster computation |
Key configuration options
The following options apply across all fine-tuning scripts:Common arguments
Common arguments
| Argument | Description |
|---|---|
--pretrained_model_name_or_path | Path to the base model (.safetensors, .ckpt, or Diffusers directory) |
--dataset_config | Path to your dataset TOML configuration file |
--output_dir | Directory to save trained model checkpoints |
--output_name | Base filename for the output model |
--save_model_as | Save format: safetensors (recommended) or ckpt |
--max_train_steps | Total number of training steps |
--learning_rate | Base learning rate (typically 1e-5 to 4e-6 for fine-tuning) |
--optimizer_type | Optimizer: AdamW8bit (memory-efficient), AdamW, Lion, etc. |
--mixed_precision | Use bf16 or fp16 for lower VRAM usage |
--gradient_checkpointing | Trade speed for reduced VRAM usage |
--cache_latents | Pre-encode images with VAE to save VRAM during training |
Architecture-specific fine-tuning differences
| Architecture | Fine-tuning-only options | Key difference from LoRA |
|---|---|---|
| SDXL | --block_lr | Per-block U-Net learning rate control is exclusive to fine-tuning |
| SD3 | --train_text_encoder, --train_t5xxl, --num_last_block_to_freeze | Full Text Encoder training; LoRA only trains adapter parts |
| FLUX.1 | --blockwise_fused_optimizers | Entire model weights updated; more experimental optimizer options available |
| Lumina | (Few specific options) | Core difference is that fine-tuning updates all model weights |
