What You’ll Build
A basic text-to-image workflow that:- Loads a Stable Diffusion model
- Encodes positive and negative prompts
- Generates an image
- Saves the result
Prerequisites
- ComfyUI installed and running
- A Stable Diffusion checkpoint (e.g.,
v1-5-pruned-emaonly.safetensors) inmodels/checkpoints/
- MODEL: The diffusion model for generating images
- CLIP: Text encoder for understanding prompts
- VAE: Encoder/decoder for converting between pixels and latents
masterpiece best quality girlbad hands- model: from CheckpointLoaderSimple
- positive: from your positive CLIPTextEncode
- negative: from your negative CLIPTextEncode
- latent_image: from EmptyLatentImage
- seed: Any number (controls randomness)
- steps: 20 (how many denoising steps)
- cfg: 8.0 (how closely to follow the prompt)
- sampler_name: euler
- scheduler: normal
- denoise: 1.0 (full denoising for text-to-image)
Complete Workflow JSON
Here’s the API format for this basic workflow:Understanding the Workflow
Node Connections
Each node ID (like “3”, “4”, etc.) references a node. Inputs like["4", 0] mean:
"4": Node ID to connect from0: Output index from that node
The Generation Process
- CheckpointLoaderSimple loads the model weights
- CLIPTextEncode converts text prompts into embeddings
- EmptyLatentImage creates a random noise tensor
- KSampler iteratively denoises the latent based on your prompts
- VAEDecode converts the latent to pixel space
- SaveImage writes the PNG to disk
Next Steps
- Experiment with different prompts
- Try changing the seed for variations
- Adjust CFG scale (higher = follows prompt more closely)
- Increase steps for potentially better quality
- Learn about image-to-image workflows
- Explore ControlNet for precise control
Tips
- Use
(word:1.2)to increase emphasis on specific terms - Use
(word:0.8)to decrease emphasis - Escape special characters:
\(,\),\{,\} - Try different samplers (euler, dpm++, etc.) for different styles
- The scheduler controls noise reduction - “normal” works for most cases