ComfyUI for Beginners: Master AI Image Workflows
If you have spent any time exploring AI image generation, you have likely encountered Stable Diffusion. You might have started with a simple web UI, dragging and dropping prompts, feeling like you were in control. Then, you hit a wall. You wanted to chain a specific upscaler after an image-to-image pass, or use a ControlNet to guide the pose. The simple UI suddenly felt like a straight line, unable to bend.
Enter ComfyUI. It is a powerful, node-based interface for Stable Diffusion that looks like a wiring diagram for a spaceship. It is intimidating at first, but once you understand the logic, you will realize it is the most efficient way to build complex AI workflows.
This guide is for complete beginners. We will break down the "scary" node graph into simple, understandable steps. By the end, you will not only run your first workflow but also understand how to modify it.
Why ComfyUI? The "Lego Blocks" Approach
Think of the standard Stable Diffusion UI as a pre-built toy car. You can paint it, change the wheels, or swap the driver. But you cannot turn that car into a helicopter without breaking the whole thing.
ComfyUI is like a bucket of Lego bricks. You want a car? You build the chassis, attach the wheels, and add the steering wheel. You want a helicopter? You swap the wheels for rotors and add an engine block. You have total freedom.
The Core Concept: Nodes and Edges
- Nodes: These are the "functions." Each node does one specific thing: load a model, process text, generate an image, or save a file.
- Edges (Wires): These are the "data pipes." You connect the output of one node (e.g., a model) to the input of another node (e.g., a sampler).
This visual structure makes your AI workflow transparent. If your image has a weird artifact, you can trace the wires back and see exactly where the pipeline broke.
Your First Workflow: From Text to Image
Let's build the most basic "Text to Image" workflow. You do not need to write code from scratch; ComfyUI is a visual language. However, understanding the *logic* is crucial.
#### Step 1: The Checkpoint Model (The Brain)
Every image starts with a base model. In ComfyUI, this is the Load Checkpoint node. It loads the main Stable Diffusion model (like SDXL or SD 1.5).
- Find it: Right-click on the canvas ->
Add Node->loaders->Load Checkpoint. - Outputs: This node has three output sockets:
MODEL,CLIP, andVAE. MODEL: The denoising UNet. This is the core brain.CLIP: The text encoder. It turns your words into numbers the model understands.VAE: The decoder. It turns the latent noise into a final pixel image.
#### Step 2: The Prompt (The Instructions)
You need to tell the brain what to draw. You need two nodes: CLIP Text Encode (Prompt) and CLIP Text Encode (Negative Prompt).
- Add them: Right-click ->
Add Node->conditioning->CLIP Text Encode (Prompt). - Connect: Drag a wire from the
CLIPoutput of theLoad Checkpointnode to theclipinput of your text encoder node. - Write your prompt: In the
CLIP Text Encodenode, type your positive prompt (e.g., "a beautiful mountain landscape, cinematic lighting, high detail"). Do the same for a negative prompt (e.g., "blurry, ugly, watermark, text").
#### Step 3: The Empty Latent Image (The Canvas)
You need a blank canvas to start drawing on. This is the Empty Latent Image node.
- Add it: Right-click ->
Add Node->latent->Empty Latent Image. - Configure: Set your desired
width,height, andbatch_size(how many images to generate at once).
#### Step 4: The Sampler (The Artist)
This is the engine room. The KSampler node takes the model, the prompt, and the empty canvas, and runs the denoising process.
- Add it: Right-click ->
Add Node->sampling->KSampler. - Connect the wires:
- 1.
Load Checkpoint(MODEL) ->KSampler(model) - 2.
CLIP Text Encode (Prompt)(CONDITIONING) ->KSampler(positive) - 3.
CLIP Text Encode (Negative Prompt)(CONDITIONING) ->KSampler(negative) - 4.
Empty Latent Image(LATENT) ->KSampler(latent_image) - Set Parameters:
seed(random number),steps(20 is a good start),cfg(7),sampler_name(DPM++ 2M Karras),scheduler(normal).
#### Step 5: The Decoder and Preview (The Final Output)
The KSampler outputs a latent image. You cannot see it yet. You need the VAE to decode it into pixels.
- Add a
VAE Decodenode: Right-click ->Add Node->latent->VAE Decode. - Connect:
- 1.
KSampler(LATENT) ->VAE Decode(samples) - 2.
Load Checkpoint(VAE) ->VAE Decode(vae) - Add a
Preview Imagenode: Right-click ->Add Node->image->Preview Image. - Connect:
VAE Decode(IMAGE) ->Preview Image(images).
Congratulations! You have just built a complete Stable Diffusion UI workflow from scratch. Click the Queue Prompt button in the top right. You should see your image generate in real-time.
Moving to Intermediate: The "Upscale" Workflow
Now that you understand the basic flow, let's make it useful. A common problem is generating a 512x512 image that looks noisy. You want to upscale it. In a simple UI, this is a separate tab. In ComfyUI, you just add more nodes to the existing AI workflow.
Let's build a simple "Image to Image + Upscale" workflow.
- 1. Start with the previous workflow. Keep the
Load Checkpoint,CLIP Text Encode, andKSamplernodes. - 2. Change the input. Instead of
Empty Latent Image, use aLoad Imagenode (Right-click ->Add Node->image->Load Image). This lets you start from a real photo or a previous generation. - 3. Add an Image Upscaler. You need a dedicated upscaler model.
- Right-click ->
Add Node->image->Image Upscale (Using Model). - Right-click ->
Add Node->loaders->Upscale Model Loader. - Connect
Upscale Model Loader(UPSCALE_MODEL) ->Image Upscale(upscale_model). - 4. Wire the flow:
Load Image(IMAGE) ->Image Upscale(image).- The upscaled image goes out. You can connect it to a
VAE Encodenode (to turn pixels back into latent space) and feed that into theKSamplerfor a second pass. - 5. Final output: Connect the final
VAE Decodeto aSave Imagenode to save the high-resolution result to your computer.
This is the power of ComfyUI. You are not limited by tabs or buttons. You are limited only by the nodes you connect.
Debugging Common Issues (The "Red Node" Problem)
You will inevitably see a red outline around a node. This means an error. Do not panic.
- Check the wires: Is a connection broken? Did you disconnect a wire by accident?
- Missing models: The
Load Checkpointnode is red? You probably don't have the model file in the correct folder (ComfyUI/models/checkpoints/). - Type mismatch: You cannot connect a
MODELoutput to aVAEinput. ComfyUI has color-coded sockets (usually blue for models, pink for conditioning, yellow for images). Make sure the colors match.
Building Your Own Library of Workflows
The best way to learn is to deconstruct. Download workflows from Civitai or share your own. Save your .json files (File -> Save) so you can reload them later.
As you master ComfyUI, you will find yourself thinking differently about image generation. You will stop looking for the "right button" and start looking for the "right node." This shift in perspective is what separates a casual user from a power user.
For a curated list of essential workflows and a deep dive into node functions, check out the Tool Library section at www.aiflowyou.com. We break down complex Stable Diffusion concepts into simple, actionable guides. If you prefer learning on the go, our WeChat Mini Program "AI快速入门手册" offers bite-sized lessons and cheat sheets for mastering ComfyUI on your mobile device.
Summary: Your Action Plan
- 1. Install ComfyUI: Follow the official GitHub instructions (it is simpler than it looks).
- 2. Download a model: Get a popular SDXL model (like "RealVisXL" or "Juggernaut XL") and place it in the
models/checkpointsfolder. - 3. Rebuild the "Text to Image" workflow from this guide. Do not copy-paste the JSON. Build it manually to learn the connections.
- 4. Experiment: Change the
sampler_namefromDPM++ 2MtoEuler. See how the image changes. - 5. Join the community: The ComfyUI community is incredibly helpful. When you get stuck, search for the specific node name.
You now have the keys to the kingdom. Go build something amazing.