Two approaches to dataset organization
sd-scripts supports two complementary dataset styles. You can use either one, or mix them in the same training run by defining them as separate[[datasets]] entries in your config file.
DreamBooth style
Point to an image directory and supply class tokens (e.g.
sks dog). Caption .txt files are optional — if no caption file exists for an image, the class tokens are used automatically.Fine-tuning style
Point to an image directory and a JSON metadata file that maps each image to its caption and tags. Gives you precise per-image control and is required when using pre-generated metadata from scripts like
merge_captions_to_metadata.py.| DreamBooth style | Fine-tuning style | |
|---|---|---|
| Caption source | .txt / .caption files or class_tokens | JSON metadata file |
| Metadata file required | No | Yes (metadata_file) |
| Regularization images | Yes (is_reg = true) | No |
| Typical use case | LoRA, DreamBooth concepts | Full fine-tunes, large curated datasets |
When mixing both styles in one training run, each style must live in its own
[[datasets]] block. You cannot mix DreamBooth and fine-tuning subsets within the same dataset.Datasets and subsets
A dataset is a group of images that share training settings likeresolution and batch_size. A subset is a partition of a dataset — typically a single image directory.
[general] → [[datasets]] → [[datasets.subsets]]. More specific settings always win. This lets you set shuffle_caption = true globally and override it for one subset that should not shuffle.
Directory structure
Images must be placed directly inside the directory you specify asimage_dir. Sub-directories are not automatically traversed (use --recursive only when generating tags).
A common convention is to prefix folders with a repeat count:
--dataset_config), the N_token folder-name convention is parsed automatically. In TOML config mode you have full explicit control.
Caption file format
For DreamBooth-style subsets, place a caption file next to each image with the same base name:, . A caption can span multiple lines when enable_wildcard = true — one line is then selected randomly per training step.
Aspect ratio bucketing
By default, all images are resized to the square resolution you specify (e.g.512 or 1024). Enabling aspect ratio bucketing lets images keep their original proportions by grouping them into resolution buckets and padding or cropping minimally.
Enable it in your dataset config:
| Option | Description |
|---|---|
enable_bucket | Turn on aspect ratio bucketing |
min_bucket_reso | Minimum bucket side length (must be divisible by bucket_reso_steps) |
max_bucket_reso | Maximum bucket side length |
bucket_reso_steps | Step size between bucket resolutions (default 64) |
bucket_no_upscale | Skip buckets that would upscale images smaller than the target |
Aspect ratio bucketing is strongly recommended for real-photograph datasets and any mixed-orientation image sets. It significantly reduces distortion from forced squaring.
Next steps
Dataset configuration
Full reference for the TOML config file format, including all dataset and subset options.
Automatic image tagging
Use WD14 Tagger to generate per-image caption files automatically.
