Skip to main content
This guide covers GPU-specific troubleshooting for NVIDIA, AMD, Intel, and other accelerators supported by ComfyUI.

NVIDIA GPU Issues

Your NVIDIA GPU is not being recognized by PyTorch.Diagnosis: Check if CUDA is available:
import torch
print(torch.cuda.is_available())
print(torch.version.cuda)
Solution:
  1. Update NVIDIA drivers to the latest version
  2. Reinstall PyTorch with CUDA support:
    pip uninstall torch torchvision torchaudio
    pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu130
    
  3. For older GPUs (GTX 10 series), use CUDA 12.6:
    pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu126
    
  4. Verify CUDA installation:
    nvidia-smi
    
Async memory allocation may not be supported on your GPU.ComfyUI checks for blacklisted cards and shows a warning if detected.Solution:
python main.py --disable-cuda-malloc
This disables async allocation and uses standard CUDA malloc.
GTX 16 series cards have broken FP16 support in their hardware.Solution: Force FP32 computation:
python main.py --force-fp32
This will use more VRAM and be slower, but produces correct results.Affected cards:
  • GTX 1660 (all variants)
  • GTX 1650 (all variants)
  • GTX 1630
  • T500, T550, T600, T1000, T1200, T2000
  • MX450, MX550
  • CMP 30HX
Modern NVIDIA GPUs should automatically use optimal precision.Verification: ComfyUI automatically enables FP16 on GPUs with compute capability 8.0+ (RTX 30/40 series).Force optimizations:
python main.py --fast
This enables:
  • FP16 accumulation for matrix operations
  • cuDNN auto-tuning for optimal kernels
FP8 Support (RTX 40 series, H100, A100):
python main.py --fp8-e4m3fn-unet
FP8 is automatically used when models are loaded in FP8 format on compute capability 8.9+ GPUs.
Choose which GPU to use when multiple are available.Solution:Set CUDA device:
python main.py --cuda-device 0
Or use environment variable:
CUDA_VISIBLE_DEVICES=1 python main.py
Set default device (reorders device priority):
python main.py --default-device 1
Check available GPUs:
nvidia-smi -L
VRAM is not being released after model unloading.Solution:ComfyUI has automatic garbage collection and cache clearing. If issues persist:
  1. Check for circular references in custom nodes (ComfyUI will log warnings about potential memory leaks)
  2. Manually trigger cleanup via the UI or API
  3. Enable deterministic mode to disable some caching:
    python main.py --deterministic
    
  4. Restart ComfyUI to clear all memory

AMD GPU Issues

ROCm is not properly installed or configured.Solution:
  1. Install ROCm following AMD’s official guide
  2. Install PyTorch with ROCm support:
    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm7.1
    
  3. For latest ROCm 7.2 (may improve performance):
    pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm7.2
    
  4. Verify detection:
    import torch
    print(torch.version.hip)
    print(torch.cuda.is_available())
    
Windows support for AMD is experimental and limited to RDNA 3, 3.5, and 4.Supported Cards:
  • RDNA 3: RX 7000 series
  • RDNA 3.5: Strix Halo, Ryzen AI Max+ 365
  • RDNA 4: RX 9000 series
Solution:Install architecture-specific PyTorch:RDNA 3 (RX 7000):
pip install --pre torch torchvision torchaudio --index-url https://rocm.nightlies.amd.com/v2/gfx110X-all/
RDNA 3.5 (Strix Halo):
pip install --pre torch torchvision torchaudio --index-url https://rocm.nightlies.amd.com/v2/gfx1151/
RDNA 4 (RX 9000):
pip install --pre torch torchvision torchaudio --index-url https://rocm.nightlies.amd.com/v2/gfx120X-all/
Older or unsupported AMD GPUs need architecture version override.Solution:For RDNA2 and older (RX 6700, 6600, etc.):
HSA_OVERRIDE_GFX_VERSION=10.3.0 python main.py
For RDNA3 (RX 7600):
HSA_OVERRIDE_GFX_VERSION=11.0.0 python main.py
Older GCN cards (RX 580, Vega): May require specific GFX version. Check your card’s GCN architecture version.
AMD GPUs need specific optimizations for best performance.Solution:
  1. Enable PyTorch Cross Attention (RDNA3+):
    TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1 python main.py --use-pytorch-cross-attention
    
  2. Enable TunableOp (first run slow, subsequent faster):
    PYTORCH_TUNABLEOP_ENABLED=1 python main.py
    
  3. For RDNA2 and older, re-enable cuDNN if experiencing issues:
    COMFYUI_ENABLE_MIOPEN=1 python main.py
    
    By default, cuDNN is disabled on RDNA3+ for better performance.
  4. Check ROCm version compatibility:
    import torch
    print(torch.version.hip)
    
    • ROCm 7.0+ required for RDNA4
    • ROCm 6.4+ recommended for best performance
FP8 support on AMD requires specific hardware and software versions.Requirements:
  • GPU: RDNA4 (RX 9000) or MI300 series
  • PyTorch: 2.7+
  • ROCm: 6.4+
Enable FP8: ComfyUI automatically detects FP8 support. To force:
python main.py --fp8-e4m3fn-unet
Verify support: Check console output during startup for “FP8 ops supported” message.

Intel GPU Issues

Intel XPU support requires specific PyTorch build.Solution:
  1. Install Intel Extension for PyTorch:
    pip install intel-extension-for-pytorch
    
  2. Install PyTorch XPU:
    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/xpu
    
  3. For latest features:
    pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/xpu
    
  4. Verify detection:
    import torch
    print(torch.xpu.is_available())
    print(torch.xpu.device_count())
    
Intel GPUs benefit from IPEX optimizations.Solution:
  1. IPEX optimization is enabled by default. If experiencing issues:
    python main.py --disable-ipex-optimize
    
  2. Select specific device:
    python main.py --oneapi-device-selector "level_zero:0"
    
  3. Check driver version: Intel frequently updates drivers with performance improvements. Ensure you have the latest Intel graphics drivers.
  4. FP16 support: ComfyUI automatically detects FP16 capability. On older PyTorch (before 2.3), FP16 is always enabled.

Other Accelerators

Requirements:
  • Ascend Basekit (driver, firmware, CANN)
  • torch-npu package
Installation:
  1. Install Ascend Basekit following official guide
  2. Install torch-npu following installation instructions
  3. Run ComfyUI normally - NPU will be automatically detected
Device Selection:
python main.py --cuda-device 0
(ASCEND_RT_VISIBLE_DEVICES is set automatically)
Requirements:
  • Cambricon CNToolkit
  • torch_mlu package
Installation:
  1. Install CNToolkit following official guide
  2. Install PyTorch MLU following installation guide
  3. Run ComfyUI - MLU will be automatically detected
Requirements:
  • Iluvatar Corex Toolkit
  • Compatible PyTorch build
Installation:
  1. Install Iluvatar Corex Toolkit following official documentation
  2. Run ComfyUI - Corex will be automatically detected

Apple Silicon (MPS)

Apple Silicon Macs use Metal Performance Shaders (MPS) for GPU acceleration.Automatic Detection: MPS is automatically detected and used on macOS 12.3+.Known Issues:macOS 14.5+ Black Image Bug: Automatically mitigated by forcing upcast attention:
python main.py --force-upcast-attention
Non-blocking transfers disabled: MPS doesn’t support non-blocking memory transfers due to PyTorch limitation. This is automatic.BF16 support: Available on macOS 14+ with Apple Silicon.Force CPU mode if issues:
python main.py --cpu
VAE on CPU (save VRAM):
python main.py --cpu-vae

General GPU Diagnostics

At startup, ComfyUI logs:
  • Total VRAM
  • Total RAM
  • Device name and type
  • PyTorch version
  • CUDA/ROCm/XPU version
  • Allocator backend
Enable verbose logging:
python main.py --verbose DEBUG
Check VRAM state: ComfyUI logs the selected VRAM state (HIGH_VRAM, NORMAL_VRAM, LOW_VRAM, etc.).Python script to check detection:
import torch
import comfy.model_management as mm

print(f"Device: {mm.get_torch_device()}")
print(f"Device name: {mm.get_torch_device_name(mm.get_torch_device())}")
print(f"Total VRAM: {mm.get_total_memory() / (1024**3):.2f} GB")
print(f"Free VRAM: {mm.get_free_memory() / (1024**3):.2f} GB")
print(f"VRAM State: {mm.vram_state}")
GPU is not being fully utilized.Common Causes:
  1. CPU bottleneck: Check CPU usage while generating
  2. Slow storage: Loading models from slow drives
  3. Wrong precision: Using FP32 when FP16 is supported
  4. Insufficient VRAM: Models constantly swapping
Solutions:
  1. Enable fast mode:
    python main.py --fast
    
  2. Use appropriate VRAM mode:
    python main.py --highvram  # If you have enough VRAM
    
  3. Enable async offloading:
    python main.py --async-offload 2
    
  4. Check preview method (previews can slow generation):
    python main.py --preview-method auto
    
GPU is overheating and throttling performance.Monitor GPU temperature:
  • NVIDIA: nvidia-smi -l 1
  • AMD: rocm-smi
  • Intel: Use system monitoring tools
Solutions:
  1. Improve case airflow
  2. Clean GPU heatsink and fans
  3. Reduce power limit if necessary
  4. Use lower precision to reduce heat:
    python main.py --fp16-unet
    
Note: ComfyUI itself doesn’t cause abnormal heat, but long generation sessions will heat up the GPU normally.

Build docs developers (and LLMs) love