Dataset Overview

Two approaches to dataset organization

sd-scripts supports two complementary dataset styles. You can use either one, or mix them in the same training run by defining them as separate [[datasets]] entries in your config file.

DreamBooth style

Point to an image directory and supply class tokens (e.g. sks dog). Caption .txt files are optional — if no caption file exists for an image, the class tokens are used automatically.

Fine-tuning style

Point to an image directory and a JSON metadata file that maps each image to its caption and tags. Gives you precise per-image control and is required when using pre-generated metadata from scripts like merge_captions_to_metadata.py.

The key difference is how captions are supplied:

	DreamBooth style	Fine-tuning style
Caption source	`.txt` / `.caption` files or `class_tokens`	JSON metadata file
Metadata file required	No	Yes (`metadata_file`)
Regularization images	Yes (`is_reg = true`)	No
Typical use case	LoRA, DreamBooth concepts	Full fine-tunes, large curated datasets

When mixing both styles in one training run, each style must live in its own [[datasets]] block. You cannot mix DreamBooth and fine-tuning subsets within the same dataset.

Datasets and subsets

A dataset is a group of images that share training settings like resolution and batch_size. A subset is a partition of a dataset — typically a single image directory.

[general]          ← options applying to all datasets and subsets
  [[datasets]]     ← per-dataset options (resolution, batch_size, …)
    [[datasets.subsets]]   ← per-directory options (image_dir, class_tokens, …)
    [[datasets.subsets]]
  [[datasets]]
    [[datasets.subsets]]

Settings cascade from [general] → [[datasets]] → [[datasets.subsets]]. More specific settings always win. This lets you set shuffle_caption = true globally and override it for one subset that should not shuffle.

Directory structure

Images must be placed directly inside the directory you specify as image_dir. Sub-directories are not automatically traversed (use --recursive only when generating tags). A common convention is to prefix folders with a repeat count:

train_data/
├── 10_my_character/     ← 10 repeats, token "my_character"
│   ├── img001.png
│   ├── img001.txt       ← optional caption
│   └── img002.png
├── 5_background_scene/
│   ├── bg001.jpg
│   └── bg001.txt
└── reg/                 ← regularization images (is_reg = true)
    └── person001.jpg

The folder-name prefix (e.g. 5_dog) is a human convention — the scripts do not parse it automatically when using a TOML config file. Set num_repeats and class_tokens explicitly in your config.

When running scripts in the legacy command-line mode (without --dataset_config), the N_token folder-name convention is parsed automatically. In TOML config mode you have full explicit control.

Caption file format

For DreamBooth-style subsets, place a caption file next to each image with the same base name:

img001.png
img001.txt        ← preferred; set caption_extension = ".txt"
img002.jpg
img002.caption    ← also accepted; set caption_extension = ".caption"

Caption files contain plain text. The default separator between tags is , . A caption can span multiple lines when enable_wildcard = true — one line is then selected randomly per training step.

1girl, hatsune miku, vocaloid, upper body, looking at viewer, microphone, stage
a girl with a microphone standing on a stage
detailed digital art of a girl with a microphone on a stage

Use the WD14 Tagger to generate caption .txt files automatically from your images.

Aspect ratio bucketing

By default, all images are resized to the square resolution you specify (e.g. 512 or 1024). Enabling aspect ratio bucketing lets images keep their original proportions by grouping them into resolution buckets and padding or cropping minimally. Enable it in your dataset config:

[[datasets]]
resolution = 1024
enable_bucket = true
min_bucket_reso = 512
max_bucket_reso = 2048
bucket_reso_steps = 64

Option	Description
`enable_bucket`	Turn on aspect ratio bucketing
`min_bucket_reso`	Minimum bucket side length (must be divisible by `bucket_reso_steps`)
`max_bucket_reso`	Maximum bucket side length
`bucket_reso_steps`	Step size between bucket resolutions (default 64)
`bucket_no_upscale`	Skip buckets that would upscale images smaller than the target

Aspect ratio bucketing is strongly recommended for real-photograph datasets and any mixed-orientation image sets. It significantly reduces distortion from forced squaring.

Next steps

Dataset configuration

Full reference for the TOML config file format, including all dataset and subset options.

Automatic image tagging

Use WD14 Tagger to generate per-image caption files automatically.

Getting Started

Dataset Preparation

LoRA Training

Fine-tuning & Other Methods

Inference & Utilities

Two approaches to dataset organization

DreamBooth style

Fine-tuning style

Datasets and subsets

Directory structure

Caption file format

Aspect ratio bucketing

Next steps

Dataset configuration

Automatic image tagging

Build docs developers (and LLMs) love

Getting Started

Dataset Preparation

LoRA Training

Fine-tuning & Other Methods

Inference & Utilities

​Two approaches to dataset organization

DreamBooth style

Fine-tuning style

​Datasets and subsets

​Directory structure

​Caption file format

​Aspect ratio bucketing

​Next steps

Dataset configuration

Automatic image tagging

Build docs developers (and LLMs) love

Two approaches to dataset organization

Datasets and subsets

Directory structure

Caption file format

Aspect ratio bucketing

Next steps