Skip to main content
This guide will walk you through creating your first complete workflow in ComfyUI. By the end, you’ll understand how nodes connect together to generate images from text prompts.

What You’ll Build

A basic text-to-image workflow that:
  • Loads a Stable Diffusion model
  • Encodes positive and negative prompts
  • Generates an image
  • Saves the result

Prerequisites

  • ComfyUI installed and running
  • A Stable Diffusion checkpoint (e.g., v1-5-pruned-emaonly.safetensors) in models/checkpoints/
1
Load the model
2
  • Right-click on the canvas and search for “CheckpointLoaderSimple”
  • Select your checkpoint file from the dropdown
  • This node outputs three connections:
    • MODEL: The diffusion model for generating images
    • CLIP: Text encoder for understanding prompts
    • VAE: Encoder/decoder for converting between pixels and latents
  • 3
    Create your prompts
    4
  • Add two “CLIPTextEncode” nodes (search by typing or use the positive/negative aliases)
  • Connect the CLIP output from CheckpointLoaderSimple to both nodes
  • In the first node (positive), enter: masterpiece best quality girl
  • In the second node (negative), enter: bad hands
  • 5
    The CLIP model converts your text descriptions into embeddings the model can understand.
    6
    Set up the latent space
    7
  • Add an “EmptyLatentImage” node
  • Set dimensions:
    • width: 512
    • height: 512
    • batch_size: 1
  • 8
    This creates a blank canvas in latent space where your image will be generated.
    9
    Configure the sampler
    10
  • Add a “KSampler” node
  • Make the following connections:
    • model: from CheckpointLoaderSimple
    • positive: from your positive CLIPTextEncode
    • negative: from your negative CLIPTextEncode
    • latent_image: from EmptyLatentImage
  • Configure sampling parameters:
    • seed: Any number (controls randomness)
    • steps: 20 (how many denoising steps)
    • cfg: 8.0 (how closely to follow the prompt)
    • sampler_name: euler
    • scheduler: normal
    • denoise: 1.0 (full denoising for text-to-image)
  • 11
    Decode to pixels
    12
  • Add a “VAEDecode” node
  • Connect:
    • samples: from KSampler output
    • vae: from CheckpointLoaderSimple
  • 13
    This converts the latent representation back into a visible image.
    14
    Save your image
    15
  • Add a “SaveImage” node
  • Connect images from VAEDecode
  • Set a filename_prefix like “my_first_image”
  • 16
    Images are saved to the output/ directory by default.
    17
    Generate
    18
    Press Ctrl+Enter (or Cmd+Enter on Mac) to queue your workflow. Watch the progress bar and see your first generated image!

    Complete Workflow JSON

    Here’s the API format for this basic workflow:
    {
      "3": {
        "class_type": "KSampler",
        "inputs": {
          "cfg": 8,
          "denoise": 1,
          "latent_image": ["5", 0],
          "model": ["4", 0],
          "negative": ["7", 0],
          "positive": ["6", 0],
          "sampler_name": "euler",
          "scheduler": "normal",
          "seed": 8566257,
          "steps": 20
        }
      },
      "4": {
        "class_type": "CheckpointLoaderSimple",
        "inputs": {
          "ckpt_name": "v1-5-pruned-emaonly.safetensors"
        }
      },
      "5": {
        "class_type": "EmptyLatentImage",
        "inputs": {
          "batch_size": 1,
          "height": 512,
          "width": 512
        }
      },
      "6": {
        "class_type": "CLIPTextEncode",
        "inputs": {
          "clip": ["4", 1],
          "text": "masterpiece best quality girl"
        }
      },
      "7": {
        "class_type": "CLIPTextEncode",
        "inputs": {
          "clip": ["4", 1],
          "text": "bad hands"
        }
      },
      "8": {
        "class_type": "VAEDecode",
        "inputs": {
          "samples": ["3", 0],
          "vae": ["4", 2]
        }
      },
      "9": {
        "class_type": "SaveImage",
        "inputs": {
          "filename_prefix": "ComfyUI",
          "images": ["8", 0]
        }
      }
    }
    

    Understanding the Workflow

    Node Connections

    Each node ID (like “3”, “4”, etc.) references a node. Inputs like ["4", 0] mean:
    • "4": Node ID to connect from
    • 0: Output index from that node

    The Generation Process

    1. CheckpointLoaderSimple loads the model weights
    2. CLIPTextEncode converts text prompts into embeddings
    3. EmptyLatentImage creates a random noise tensor
    4. KSampler iteratively denoises the latent based on your prompts
    5. VAEDecode converts the latent to pixel space
    6. SaveImage writes the PNG to disk

    Next Steps

    • Experiment with different prompts
    • Try changing the seed for variations
    • Adjust CFG scale (higher = follows prompt more closely)
    • Increase steps for potentially better quality
    • Learn about image-to-image workflows
    • Explore ControlNet for precise control

    Tips

    • Use (word:1.2) to increase emphasis on specific terms
    • Use (word:0.8) to decrease emphasis
    • Escape special characters: \(, \), \{, \}
    • Try different samplers (euler, dpm++, etc.) for different styles
    • The scheduler controls noise reduction - “normal” works for most cases

    Build docs developers (and LLMs) love