Skip to main content
This guide covers various techniques to improve ComfyUI’s performance based on your hardware configuration.

VRAM Management

ComfyUI includes smart memory management that automatically optimizes VRAM usage. Understanding the different VRAM modes helps you choose the best configuration.
ComfyUI operates in different VRAM states depending on your hardware:HIGH_VRAM Mode (Default for GPUs with sufficient VRAM):
python main.py --highvram
  • Keeps models in VRAM
  • Fastest performance
  • Recommended for GPUs with 12GB+ VRAM
NORMAL_VRAM Mode (Default):
python main.py
  • Balanced approach
  • Moves models between RAM and VRAM as needed
  • Works for most configurations
LOW_VRAM Mode:
python main.py --lowvram
  • Offloads models more aggressively
  • Slower but works with limited VRAM
  • Recommended for GPUs with 4-8GB VRAM
NO_VRAM Mode:
python main.py --novram
  • Keeps minimal data in VRAM
  • Very slow but works with < 2GB VRAM
  • Last resort for extremely limited GPUs
GPU_ONLY Mode:
python main.py --gpu-only
  • Never offload to CPU
  • Useful when you want to keep everything on GPU
  • Requires sufficient VRAM for your workflow
Reserve VRAM for other applications to prevent conflicts:
python main.py --reserve-vram 2.0
This reserves 2GB of VRAM. Adjust based on your needs:
  • Windows Default: 600MB (due to shared memory overhead)
  • Linux Default: 400MB
  • 16GB+ GPUs: Additional 100MB reserved automatically
Lower reservation = more memory for ComfyUI but may cause issues with other GPU applications.
ComfyUI automatically pins memory for faster transfers between RAM and VRAM on NVIDIA and AMD GPUs.Automatic pinning limits:
  • Windows: 45% of system RAM (OS limit ~50%)
  • Linux: 95% of system RAM
To disable pinned memory:
python main.py --disable-pinned-memory
Only disable if you experience crashes or memory issues.

Attention Mechanisms

Different attention implementations offer varying performance characteristics.
xformers provides memory-efficient attention but has compatibility issues.Install xformers:
pip install xformers
Warning: Avoid xformers 0.0.18 - it causes black images at high resolutions.Disable xformers if you experience issues:
python main.py --disable-xformers
Note: PyTorch attention is now preferred over xformers on most systems.
Alternative attention modes for specific hardware:Split Cross Attention:
python main.py --use-split-cross-attention
  • Reduces memory usage
  • Slower than standard attention
  • Useful for very limited VRAM
Quad Cross Attention:
python main.py --use-quad-cross-attention
  • Even more memory efficient
  • Significantly slower
  • For extreme VRAM limitations

Precision and Data Types

Choosing the right precision can significantly impact performance and memory usage.
BFloat16 offers better numerical stability than FP16 with similar performance.Automatically used on:
  • NVIDIA GPUs with compute capability 8.0+
  • AMD RDNA3+ GPUs
  • Apple Silicon (macOS 14+)
  • Intel XPU with BF16 support
Force BF16:
python main.py --bf16-unet
Note: BF16 is automatically selected when supported and beneficial.
FP8 precision offers significant memory savings on compatible hardware.Supported on:
  • NVIDIA GPUs: RTX 40 series, H100, A100 (compute capability 8.9+)
  • AMD GPUs: RDNA3+ with ROCm 6.4+ and PyTorch 2.7+
Force FP8:
python main.py --fp8-e4m3fn-unet
Note: FP8 is automatically used when models are loaded in FP8 format and the GPU supports FP8 compute.
If you experience quality issues, force full precision:
python main.py --force-fp32
This disables all FP16/BF16 optimizations. Use only for debugging - significantly slower and uses more VRAM.

Caching and Execution

ComfyUI only re-executes parts of the workflow that change between runs.Cache Types:Classic Cache (Default):
  • Caches all node outputs
  • Best for most workflows
LRU Cache:
python main.py --cache-lru 100
  • Least Recently Used cache with size limit
  • Value = number of cached items
  • Good for memory-constrained systems
RAM Pressure Cache:
python main.py --cache-ram 0.8
  • Dynamically adjusts cache based on RAM usage
  • Value = RAM usage threshold (0.0-1.0)
No Cache:
python main.py --cache-none
  • Disables all caching
  • Useful for debugging only
Enabled by default, smart memory automatically manages model loading and unloading.Disable for manual control:
python main.py --disable-smart-memory
Warning: Only disable if you know what you’re doing. Smart memory prevents OOM errors.

Async Operations

Asynchronous offloading improves performance by overlapping memory transfers with computation.Automatically enabled on NVIDIA and AMD GPUs with 2 streams.Customize stream count:
python main.py --async-offload 3
Disable if experiencing issues:
python main.py --disable-async-offload
Benefits:
  • Reduces idle time during model loading
  • Improves overall throughput
  • Most effective with multiple large models
Additional NVIDIA-specific optimizations:Disable CUDA Malloc (older GPUs):
python main.py --disable-cuda-malloc
Fast Mode (enables multiple optimizations):
python main.py --fast
Enables:
  • FP16 accumulation
  • cuDNN auto-tuning (first run slower, subsequent runs faster)
Force Channels Last (experimental):
python main.py --force-channels-last
May improve performance on some models.

Platform-Specific Optimizations

AMD-specific settings for better performance:Enable AOTriton (RDNA3+):
TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1 python main.py --use-pytorch-cross-attention
Enable TunableOp:
PYTORCH_TUNABLEOP_ENABLED=1 python main.py
  • First run is very slow (builds optimization database)
  • Subsequent runs are faster
Disable cuDNN for RDNA3+: Automatically disabled on RDNA3+ for better performance. To re-enable:
COMFYUI_ENABLE_MIOPEN=1 python main.py
Intel Arc GPU optimizations:IPEX Optimization (enabled by default): Automatically optimizes models for Intel XPU.Disable if experiencing issues:
python main.py --disable-ipex-optimize
OneAPI Device Selector:
python main.py --oneapi-device-selector "level_zero:0"
Selects specific Intel device when multiple are available.
macOS-specific settings:MPS is automatically detected and used on Apple Silicon Macs.Force CPU mode if experiencing issues:
python main.py --cpu
VAE on CPU (reduces VRAM usage):
python main.py --cpu-vae
Note: Non-blocking transfers are disabled on MPS due to PyTorch limitations.

Workflow Optimization Tips

  1. Minimize dynamic changes: ComfyUI only re-executes changed nodes. Keep static parts of your workflow unchanged.
  2. Use appropriate image sizes: Larger images require exponentially more VRAM and time.
  3. Batch processing: Process multiple images in a single batch when possible.
  4. Preview settings: Use low-resolution previews during development:
    python main.py --preview-method auto
    
  5. Model selection: Smaller models (SD 1.5) are faster than larger ones (SDXL, SD3).
  6. LoRA usage: Multiple LoRAs increase memory usage and loading time.

Monitoring and Diagnostics

Enable verbose logging to see detailed performance information:
python main.py --verbose DEBUG
Check VRAM usage:
  • ComfyUI logs total VRAM and RAM at startup
  • Model loading/unloading is logged
  • Performance warnings appear in console
Deterministic mode (for reproducible results):
python main.py --deterministic
  • Ensures reproducible outputs
  • Slightly slower
  • Disables some optimizations

Build docs developers (and LLMs) love