Skip to main content

Two approaches to dataset organization

sd-scripts supports two complementary dataset styles. You can use either one, or mix them in the same training run by defining them as separate [[datasets]] entries in your config file.

DreamBooth style

Point to an image directory and supply class tokens (e.g. sks dog). Caption .txt files are optional — if no caption file exists for an image, the class tokens are used automatically.

Fine-tuning style

Point to an image directory and a JSON metadata file that maps each image to its caption and tags. Gives you precise per-image control and is required when using pre-generated metadata from scripts like merge_captions_to_metadata.py.
The key difference is how captions are supplied:
DreamBooth styleFine-tuning style
Caption source.txt / .caption files or class_tokensJSON metadata file
Metadata file requiredNoYes (metadata_file)
Regularization imagesYes (is_reg = true)No
Typical use caseLoRA, DreamBooth conceptsFull fine-tunes, large curated datasets
When mixing both styles in one training run, each style must live in its own [[datasets]] block. You cannot mix DreamBooth and fine-tuning subsets within the same dataset.

Datasets and subsets

A dataset is a group of images that share training settings like resolution and batch_size. A subset is a partition of a dataset — typically a single image directory.
[general]          ← options applying to all datasets and subsets
  [[datasets]]     ← per-dataset options (resolution, batch_size, …)
    [[datasets.subsets]]   ← per-directory options (image_dir, class_tokens, …)
    [[datasets.subsets]]
  [[datasets]]
    [[datasets.subsets]]
Settings cascade from [general][[datasets]][[datasets.subsets]]. More specific settings always win. This lets you set shuffle_caption = true globally and override it for one subset that should not shuffle.

Directory structure

Images must be placed directly inside the directory you specify as image_dir. Sub-directories are not automatically traversed (use --recursive only when generating tags). A common convention is to prefix folders with a repeat count:
train_data/
├── 10_my_character/     ← 10 repeats, token "my_character"
│   ├── img001.png
│   ├── img001.txt       ← optional caption
│   └── img002.png
├── 5_background_scene/
│   ├── bg001.jpg
│   └── bg001.txt
└── reg/                 ← regularization images (is_reg = true)
    └── person001.jpg
The folder-name prefix (e.g. 5_dog) is a human convention — the scripts do not parse it automatically when using a TOML config file. Set num_repeats and class_tokens explicitly in your config.
When running scripts in the legacy command-line mode (without --dataset_config), the N_token folder-name convention is parsed automatically. In TOML config mode you have full explicit control.

Caption file format

For DreamBooth-style subsets, place a caption file next to each image with the same base name:
img001.png
img001.txt        ← preferred; set caption_extension = ".txt"
img002.jpg
img002.caption    ← also accepted; set caption_extension = ".caption"
Caption files contain plain text. The default separator between tags is , . A caption can span multiple lines when enable_wildcard = true — one line is then selected randomly per training step.
1girl, hatsune miku, vocaloid, upper body, looking at viewer, microphone, stage
a girl with a microphone standing on a stage
detailed digital art of a girl with a microphone on a stage
Use the WD14 Tagger to generate caption .txt files automatically from your images.

Aspect ratio bucketing

By default, all images are resized to the square resolution you specify (e.g. 512 or 1024). Enabling aspect ratio bucketing lets images keep their original proportions by grouping them into resolution buckets and padding or cropping minimally. Enable it in your dataset config:
[[datasets]]
resolution = 1024
enable_bucket = true
min_bucket_reso = 512
max_bucket_reso = 2048
bucket_reso_steps = 64
OptionDescription
enable_bucketTurn on aspect ratio bucketing
min_bucket_resoMinimum bucket side length (must be divisible by bucket_reso_steps)
max_bucket_resoMaximum bucket side length
bucket_reso_stepsStep size between bucket resolutions (default 64)
bucket_no_upscaleSkip buckets that would upscale images smaller than the target
Aspect ratio bucketing is strongly recommended for real-photograph datasets and any mixed-orientation image sets. It significantly reduces distortion from forced squaring.

Next steps

Dataset configuration

Full reference for the TOML config file format, including all dataset and subset options.

Automatic image tagging

Use WD14 Tagger to generate per-image caption files automatically.

Build docs developers (and LLMs) love