Validation Loss

What is validation loss?

Validation loss measures how well your model performs on data it has never seen during training. By holding out a small portion of your dataset and evaluating it periodically, you can detect overfitting — the point at which the model memorizes the training images instead of learning general concepts. Without validation loss you only have training loss, which will decrease continuously regardless of whether the model is actually improving or just memorizing. When you see training loss still falling but validation loss rising, the model has started to overfit and you should stop training or reduce your learning rate.

Every validation run uses the same random seed for noise generation and timestep selection. This makes the metric deterministic: any change in the validation loss reflects a real change in model weights, not random variance in the evaluation process.

How to enable validation

There are two ways to set up a validation split. The dataset config TOML approach is recommended because it gives you precise control over which images are used for validation.

Dataset config TOML (recommended)
Command-line argument

Add a validation subset to your dataset .toml file by setting validation_split on one or more [[datasets.subsets]] entries.Use an entire directory for validation by setting validation_split = 1.0:

[general]
shuffle_caption = true
keep_tokens = 1

[[datasets]]
resolution = "1024,1024"
batch_size = 2

  [[datasets.subsets]]
  image_dir = "path/to/train_images"
  caption_extension = ".txt"
  num_repeats = 10

  [[datasets.subsets]]
  image_dir = "path/to/validation_images"
  caption_extension = ".txt"
  validation_split = 1.0  # use this entire subset for validation

Use a fraction of a subset for validation by setting validation_split to a value between 0.0 and 1.0:

[[datasets]]
resolution = "1024,1024"
batch_size = 2

  [[datasets.subsets]]
  image_dir = "path/to/images"
  caption_extension = ".txt"
  num_repeats = 10
  validation_split = 0.1  # reserve 10% of this subset for validation

The TOML setting takes precedence over the command-line --validation_split argument.

If you do not want to modify your dataset config, pass --validation_split directly to the training command. This splits a random percentage of your entire training dataset.

accelerate launch train_network.py \
  --pretrained_model_name_or_path="model.safetensors" \
  --dataset_config="dataset_config.toml" \
  --output_dir="output" \
  --output_name="my_lora" \
  --network_module=networks.lora \
  --network_dim=32 \
  --validation_split 0.1

The --validation_split command-line argument is ignored when validation_split is already defined in the dataset config TOML. If you have both, the TOML value wins.

Configuration options

Argument	TOML key	Description
`--validation_split <f>`	`validation_split`	Fraction of data to hold out for validation. The command-line argument applies globally; the TOML key applies per subset.
`--validate_every_n_steps <n>`	—	Run a validation pass every N training steps.
`--validate_every_n_epochs <n>`	—	Run a validation pass every N epochs. Defaults to once per epoch when not specified.
`--max_validation_steps <n>`	—	Cap the number of validation batches per evaluation pass. Useful if your validation set is large. Omit to use the entire validation set.
`--validation_seed <n>`	`validation_seed`	Seed for validation dataloader shuffling. Falls back to the main training `--seed` when not set.

Complete example

Prepare your dataset config

Create a dataset_config.toml that separates training and validation images:

[general]
shuffle_caption = true
keep_tokens = 1

[[datasets]]
resolution = "1024,1024"
batch_size = 2

  [[datasets.subsets]]
  image_dir = "path/to/your_images"
  caption_extension = ".txt"
  num_repeats = 10

  [[datasets.subsets]]
  image_dir = "path/to/your_validation_images"
  caption_extension = ".txt"
  validation_split = 1.0

Run training

Launch training with logging enabled so you can view the validation metric in TensorBoard:

accelerate launch sdxl_train_network.py \
  --pretrained_model_name_or_path="sd_xl_base_1.0.safetensors" \
  --dataset_config="dataset_config.toml" \
  --output_dir="output" \
  --output_name="my_lora" \
  --network_module=networks.lora \
  --network_dim=32 \
  --network_alpha=16 \
  --save_every_n_epochs=1 \
  --learning_rate=1e-4 \
  --optimizer_type="AdamW8bit" \
  --mixed_precision="bf16" \
  --logging_dir=logs

Monitor in TensorBoard

Open TensorBoard and watch the loss/validation metric alongside loss/current:

tensorboard --logdir logs

Navigate to http://localhost:6006 in your browser. Look for loss/validation in the Scalars panel.

Reading the results

The validation loss is logged to TensorBoard (or Weights & Biases if you use --log_with=wandb) as loss/validation. A healthy training run looks like this:

Training loss decreases steadily over time.
Validation loss also decreases, tracking training loss loosely.
If validation loss starts rising while training loss keeps falling, the model is overfitting. Consider stopping training, reducing the learning rate, or increasing validation_split to give the model more validation signal.

Keep a separate folder of validation images that you do not include in the training subset. Images the model has never trained on give you a more honest overfitting signal than a randomly split subset that shares the same distribution.

Getting Started

Dataset Preparation

LoRA Training

Fine-tuning & Other Methods

Inference & Utilities

What is validation loss?

How to enable validation

Configuration options

Complete example

Reading the results

Build docs developers (and LLMs) love

Getting Started

Dataset Preparation

LoRA Training

Fine-tuning & Other Methods

Inference & Utilities

​What is validation loss?

​How to enable validation

​Configuration options

​Complete example

​Reading the results

Build docs developers (and LLMs) love

What is validation loss?

How to enable validation

Configuration options

Complete example

Reading the results