Overview
Pass a TOML (or JSON) config file to any training script with--dataset_config:
--train_data_dir, --reg_data_dir, and --in_json command-line arguments. When an option exists in both places, the config file value takes priority.
Configuration structure
Settings are organized into three scopes that cascade from general to specific:keep_tokens value in [[datasets.subsets]] overrides the value in [[datasets]], which overrides [general].
Complete example
[general] section
Options in[general] apply to every dataset and subset unless overridden at a lower scope.
Randomly shuffle the comma-separated tags in each caption before training. Helps the model learn tags independently rather than as positional sequences.
File extension for caption sidecar files. Common values are
".txt" and ".caption".Number of tokens at the start of each caption to keep in place when
shuffle_caption is enabled. Set to 1 to keep the trigger word first.Enable aspect ratio bucketing across all datasets. Images are grouped into resolution buckets to preserve their original proportions.
Training resolution. Accepts a single integer (square) or a
[width, height] pair. Can be overridden per dataset.Number of images per training step. Equivalent to
--train_batch_size.[[datasets]] options
Each[[datasets]] block defines one dataset. Subsets nested inside share these settings.
Resolution and batching
Training resolution for this dataset. Use a single integer for a square (e.g.
512) or a [width, height] array for a rectangle (e.g. [768, 512]).Images per training step for this dataset. Equivalent to
--train_batch_size.Aspect ratio bucketing
Enable aspect ratio bucketing for this dataset. When enabled, images are resized to the nearest bucket resolution to preserve proportions.
Step size in pixels between bucket resolutions. All
min_bucket_reso and max_bucket_reso values must be divisible by this number.Minimum bucket resolution (shortest side). Must be divisible by
bucket_reso_steps.Maximum bucket resolution (longest side). Must be divisible by
bucket_reso_steps.When
true, images smaller than a bucket are not upscaled to fill it. Recommended for datasets that mix large and small images.Skip images whose original area is at or below this resolution. Useful when the same directory is shared across multiple datasets at different resolutions — prevents small images from appearing in high-resolution datasets.
[[datasets.subsets]] options
Each subset points to one image directory. Multiple subsets can belong to the same dataset.Common options
Absolute path to the image directory. Images must be placed directly inside this directory — subdirectories are not scanned.
Number of times to repeat each image per epoch. Equivalent to
--dataset_repeats for fine-tuning. Use higher values for small subsets to balance training time.Randomly flip images horizontally during training. Do not use for asymmetric subjects (text, faces, characters with distinctive left/right features).
Apply random color jitter during training. Incompatible with latent caching.
Shuffle caption tags for images in this subset. Overrides the
[general] setting.Number of tags at the start of each caption to keep fixed when shuffling. Overrides higher-scope settings.
A delimiter that splits a caption into a fixed prefix, a shuffled/dropped middle, and a fixed suffix. For example, with
"|||", the caption "trigger ||| tag1, tag2 ||| quality tags" keeps trigger and quality tags fixed while shuffling the middle.Caption file extension for this subset.
String prepended to every caption. Included when shuffling.
String appended to every caption. Included when shuffling.
Separator between tags in the caption. Normally you do not need to change this.
An additional separator. Tags grouped by this separator are treated as a single unit for shuffling and dropout. For example,
"sky;;;cloud;;;day" with secondary_separator = ";;;" becomes "sky,cloud,day" and is shuffled or dropped as one tag.Enable wildcard and multi-line caption notation. With wildcards,
{simple|white} background randomly picks one value. With multi-line captions, one line is selected per step.Randomly crop images instead of center-cropping. Cannot be used with
enable_bucket.Cache image dimensions and captions to
metadata_cache.json in image_dir. Speeds up subsequent runs on large datasets.DreamBooth-specific options
Class tokens (trigger words) for this subset. Used as the caption when no caption file exists for an image. If neither
class_tokens nor a caption file is found for an image, training will error.Mark this subset as a regularization (prior-preservation) subset. Regularization images are used to prevent language drift and are not the target of fine-tuning.
Fine-tuning-specific options
Path to the JSON metadata file for this subset. Required for fine-tuning-style subsets. The file maps image paths to captions and tags. Equivalent to
--in_json.Caption dropout options
These options control caption dropout, which trains the model to work with and without captions.Probability (0–1) that the entire caption is dropped for a given image step.
Drop all captions every N epochs.
Probability (0–1) that each individual tag is dropped from the caption.
Dataset style examples
DreamBooth style
Use when you have images in a directory and want to associate them with a trigger word. Caption files are optional.Fine-tuning style
Use when you have a pre-built metadata JSON file (generated bymerge_captions_to_metadata.py or similar).
Mixed style (DreamBooth + fine-tuning)
Both dataset styles can coexist in a single config. Each style must be in its own[[datasets]] block.
Multi-resolution with skip_image_resolution
Train the same images at multiple resolutions and exclude small images from high-resolution datasets:Duplicate subset handling
If two subsets in the same dataset point to the sameimage_dir (DreamBooth) or metadata_file (fine-tuning), the second is ignored. Subsets in different datasets pointing to the same directory are not considered duplicates and are both used — this is how multi-resolution training works.
Command-line arguments overridden by config
When--dataset_config is provided, these command-line arguments are ignored entirely:
--train_data_dir--reg_data_dir--in_json
--resolution, --batch_size, --shuffle_caption), the config file value takes priority over the command-line value.
Common errors
| Error | Cause |
|---|---|
required key not provided @ data['datasets'][0]['subsets'][0]['image_dir'] | image_dir is missing from a subset |
expected int for dictionary value | A numeric option has the wrong type (e.g. a string instead of a number) |
extra keys not allowed | An option name is misspelled or not supported at that scope level |
