Skip to main content
Anima is a Diffusion Transformer (DiT) model based on the MiniTrainDIT design with Rectified Flow training. It uses a Qwen3-0.6B text encoder, an LLM Adapter bridge, and a Qwen-Image VAE with a 16-channel, 8× spatial downscale latent space.
Anima support was added in sd-scripts v0.10.1. Model weights are available from the circlestone-labs/Anima repository on Hugging Face.

Supported training methods

MethodScriptSupported
LoRAanima_train_network.pyYes
Full fine-tuninganima_train.pyYes
LoHa / LoKranima_train_network.pyYes (experimental)
Textual InversionNo
ControlNetNo

Required model files

You need four components before training:
ComponentDescriptionSource
Anima DiTBase DiT model .safetensorscirclestone-labs/Anima
Qwen3-0.6BText encoder (HuggingFace dir or .safetensors)Qwen3-0.6B
Qwen-Image VAEVAE model .safetensors or .pthcirclestone-labs/Anima
LLM Adapter6-layer Transformer bridge (optional, loaded from DiT if bundled)Bundled in DiT file
The LLM Adapter is typically bundled inside the DiT model file. If the key llm_adapter.out_proj.weight is present in the DiT weights, you do not need to specify --llm_adapter_path separately.

Training command

accelerate launch --num_cpu_threads_per_process 1 anima_train_network.py \
  --pretrained_model_name_or_path="/path/to/anima-dit.safetensors" \
  --qwen3="/path/to/Qwen3-0.6B" \
  --vae="/path/to/qwen-image-vae.safetensors" \
  --dataset_config="my_anima_dataset_config.toml" \
  --output_dir="./output" \
  --output_name="my_anima_lora" \
  --save_model_as=safetensors \
  --network_module=networks.lora_anima \
  --network_dim=8 \
  --learning_rate=1e-4 \
  --optimizer_type="AdamW8bit" \
  --lr_scheduler="constant" \
  --timestep_sampling="sigmoid" \
  --discrete_flow_shift=1.0 \
  --max_train_epochs=10 \
  --save_every_n_epochs=1 \
  --mixed_precision="bf16" \
  --gradient_checkpointing \
  --cache_latents \
  --cache_text_encoder_outputs \
  --vae_chunk_size=64 \
  --vae_disable_cache
If training loss becomes NaN, verify that your PyTorch version is 2.5 or higher.

Key Anima-specific arguments

--pretrained_model_name_or_path
string
required
Path to the Anima DiT model .safetensors file. ComfyUI format with a net. key prefix is supported.
--qwen3
string
required
Path to the Qwen3-0.6B text encoder. Can be a HuggingFace model directory or a single .safetensors file. The text encoder is always frozen during training.
--vae
string
required
Path to the Qwen-Image VAE .safetensors or .pth file. The architecture is fixed: dim=96, z_dim=16.
--llm_adapter_path
string
Path to a separate LLM adapter weights file. If omitted, the adapter is loaded from the DiT file when the key llm_adapter.out_proj.weight exists.
--timestep_sampling
string
default:"sigmoid"
Timestep sampling method. Options: sigma, uniform, sigmoid, shift, flux_shift. Same options as FLUX training.
--discrete_flow_shift
number
default:"1.0"
Shift for the timestep distribution. Used when --timestep_sampling=shift.
--vae_chunk_size
integer
Process the VAE in chunks of this size to reduce memory usage. Recommended: 64.
--vae_disable_cache
boolean
Disables the VAE cache to reduce memory usage. Use alongside --vae_chunk_size.
--attn_mode
string
default:"torch"
Attention implementation. Options: torch, xformers, flash, sageattn. Note: sageattn is inference-only and cannot be used for training.

Network module

Use networks.lora_anima as the --network_module:
--network_module=networks.lora_anima
LoHa and LoKr are also supported for Anima — see the LoHa / LoKr page for details.

Converting to ComfyUI format

After training, convert the LoRA to ComfyUI format using:
python networks/convert_anima_lora_to_comfy.py \
  --input "/path/to/my_anima_lora.safetensors" \
  --output "/path/to/my_anima_lora_comfy.safetensors"

Build docs developers (and LLMs) love