Installation
ControlNet requires OpenCV for image processing:Quick start
Supported models
MaxDiffusion supports ControlNet for:- Stable Diffusion 1.4: Uses
runwayml/stable-diffusion-v1-5base model - Stable Diffusion XL: Uses SDXL base model with ControlNet
Architecture
ControlNet adds trainable copies of the encoder layers to enable conditioning:- Control input: Edge map, depth map, or other conditioning signal
- ControlNet: Trainable encoder that processes control input
- Base model: Stable Diffusion UNet with injected control features
- Conditioning scale: Adjustable influence of control signal
SD 1.4 ControlNet
Basic usage
The SD 1.4 pipeline (generate_controlnet_replicated.py:src/maxdiffusion/controlnet/generate_controlnet_replicated.py) uses a Canny edge detector:
Configuration
Customize via config parameters:Implementation
The SD 1.4 pipeline loads ControlNet and base model (generate_controlnet_replicated.py:41-47):
Inference
The pipeline processes control image and generates conditioned output (generate_controlnet_replicated.py:52-74):
SDXL ControlNet
Basic usage
The SDXL pipeline (generate_controlnet_sdxl_replicated.py:src/maxdiffusion/controlnet/generate_controlnet_sdxl_replicated.py) includes Canny edge detection:
Edge detection
The SDXL pipeline applies Canny edge detection to input images (generate_controlnet_sdxl_replicated.py:44-49):
Implementation
SDXL ControlNet uses bfloat16 precision (generate_controlnet_sdxl_replicated.py:51-63):
Parameters
| Parameter | Description | Default |
|---|---|---|
prompt | Text description of desired image | Required |
negative_prompt | Concepts to avoid | Empty |
controlnet_image | URL or path to control image | Required |
controlnet_model_name_or_path | ControlNet model checkpoint | Required |
controlnet_from_pt | Convert from PyTorch format | True |
controlnet_conditioning_scale | Control signal strength (0.0-2.0) | 1.0 |
num_inference_steps | Denoising steps | 50 |
per_device_batch_size | Images per device | 1 |
seed | Random seed | 0 |
Conditioning scale
Thecontrolnet_conditioning_scale parameter controls how strongly the control signal influences generation:
- 0.0: No control (standard generation)
- 0.5: Light control, more creative freedom
- 1.0: Balanced control (recommended)
- 1.5: Strong control adherence
- 2.0: Very strict control following
Example: Adjusting control strength
Control signal types
ControlNet supports various conditioning types:Canny edges
- Use case: Preserve structural composition
- Model:
lllyasviel/sd-controlnet-canny - Preprocessing: Canny edge detection (threshold 100, 200)
Depth maps
- Use case: Control spatial depth and 3D structure
- Model:
lllyasviel/sd-controlnet-depth - Preprocessing: MiDaS depth estimation
Segmentation
- Use case: Control object layout and positioning
- Model:
lllyasviel/sd-controlnet-seg - Preprocessing: Semantic segmentation
Human pose
- Use case: Control human figure poses
- Model:
lllyasviel/sd-controlnet-openpose - Preprocessing: OpenPose skeleton detection
Custom control images
Provide custom edge maps or control signals:- Match the target generation resolution
- Be grayscale or 3-channel (RGB)
- Clearly define structural elements
Multi-device inference
ControlNet usespmap for multi-device replication:
Output
Generated images are saved asgenerated_image.png. The first image in the batch is saved by default.
To save all images:
Examples
Building from edges
Portrait from pose
Landscape from depth
Next steps
SDXL inference
Higher quality base model for ControlNet
Stable Diffusion
Standard SD inference without control
Training overview
Train custom ControlNet models
Configuration
Full configuration reference