How it works
When you train Textual Inversion, you:- Choose a unique token string (e.g.,
mychar) that doesn’t exist in the model’s vocabulary. - Initialize that token’s embedding vector from a known word (e.g.,
girl). - Train the embedding so that, when you write
mycharin a prompt, the model generates your concept.
.safetensors file (kilobytes to a few megabytes) containing only the learned embedding vectors.
Fine-tuning vs LoRA vs Textual Inversion
| Fine-tuning | LoRA | Textual Inversion | |
|---|---|---|---|
| What’s trained | All model weights | Adapter network | Embedding vectors only |
| File size | Several GB | Few MB to hundreds of MB | Kilobytes to a few MB |
| VRAM requirement | High | Medium | Low |
| Expressive power | Highest | High | Limited to text-space concepts |
| Best for | Full style overhaul | Characters, styles | Simple concepts, styles |
Supported models
| Script | Models |
|---|---|
train_textual_inversion.py | Stable Diffusion 1.x and 2.x |
sdxl_train_textual_inversion.py | Stable Diffusion XL |
Textual Inversion is not currently supported for FLUX, SD3, or Lumina. For those architectures, use LoRA or fine-tuning instead.
Dataset requirements
Textual Inversion typically needs fewer images than LoRA or fine-tuning. 5–20 images of the concept you want to teach is often sufficient, though more images and varied compositions improve generalization. Create a TOML dataset configuration file:--token_string. For example, if your token is mychar, write captions like:
--debug_dataset — look for token IDs ≥ 49408 (those are your new custom tokens).
Key arguments
Textual Inversion-specific arguments
Textual Inversion-specific arguments
The unique token string for your concept. Must not already exist in the tokenizer’s vocabulary. Use this string in all your training captions (e.g.,
mychar 1girl).The word used to initialize the embedding vector. Choose something conceptually close to what you’re teaching (e.g.,
girl, dog, painting). Must resolve to a single token.Number of embedding vectors to use for this token. More vectors give greater expressiveness but consume tokens from the 77-token prompt limit. Values of 2–4 are common.With
--token_string=mychar and --num_vectors_per_token=4, the system creates: mychar, mychar1, mychar2, mychar3.Path to an existing embedding file to resume training from.
Ignore captions and use built-in object templates like
"a photo of a {}". Matches the original Textual Inversion paper implementation. Good for characters and objects.Ignore captions and use built-in style templates like
"a painting in the style of {}". Good for artistic styles.Training parameters
Training parameters
| Argument | Recommended value | Notes |
|---|---|---|
--learning_rate | 1e-6 | Lower than LoRA training; adjust based on results |
--max_train_steps | 1000–2000 | Fewer steps needed vs fine-tuning |
--optimizer_type | AdamW8bit | Memory-efficient |
--mixed_precision | fp16 or bf16 | Reduces VRAM usage |
--cache_latents | — | Pre-encode VAE outputs; reduces VRAM usage |
--gradient_checkpointing | — | Additional VRAM savings |
SDXL-specific memory options
SDXL-specific memory options
| Argument | Description |
|---|---|
--cache_text_encoder_outputs | Cache text encoder outputs to save VRAM |
--mixed_precision bf16 | Use bf16 on RTX 30 series or later |
Training commands
- SD 1.x / 2.x
- SDXL
Using the trained embedding
The trained embedding file is a.safetensors file saved to --output_dir. To use it:
AUTOMATIC1111 WebUI
Place the
.safetensors file in the embeddings/ folder. Use the token string in your prompt.ComfyUI
Load the embedding with the appropriate embedding node and use the token string in prompts.
Diffusers
Load the embedding file directly using the embedding path in your pipeline.
"mychar standing in a park" — and the model will apply the learned embedding automatically.
Troubleshooting
Token string already exists in tokenizer
Token string already exists in tokenizer
Use a unique string that doesn’t appear in the model’s vocabulary. Try adding numbers or uncommon character combinations (e.g.,
mychar123, xyzperson).No improvement after training
No improvement after training
- Make sure every caption includes the token string.
- Try a lower learning rate (e.g.,
5e-7). - Increase the number of training steps.
- Enable
--cache_latentsto stabilize training.
Out-of-memory errors
Out-of-memory errors
- Reduce batch size in the dataset config.
- Add
--gradient_checkpointing. - Add
--cache_latents.
