ControlNet

Model description

ControlNet is a neural network structure to control diffusion models by adding extra conditions, a game changer for AI Image generation. It brings unprecedented levels of control to Stable Diffusion. The revolutionary thing about ControlNet is its solution to the problem of spatial consistency.

This is simple: we want to control SD to fill a circle with colors, and the prompt contains some description of our target.

Stable diffusion is trained on billions of images, and it already knows what is "cyan", what is "circle", what is "pink", and what is "background".

But it does not know the meaning of that "Control Image (Source Image)". Our target is to let it know.

Step 1: Installation

Install

pip3 install open_clip_torch transformers einops omegaconf
pip3 install pytorch-lightning==1.9.5
pip3 install urllib3==1.26
yum install -y mesa-libGL

Build the Stable Difussion to control

You need to decide which Stable Diffusion Model you want to control. In this example, we will just use standard SD1.5. You can download it from the official page of Stability. You want the file "v1-5-pruned.ckpt". (Or "v2-1_512-ema-pruned.ckpt" if you are using SD2.)

# We provide a simple script for you to achieve this easily. 
    
#If your SD filename is "./models/v1-5-pruned.ckpt" and you want the script to save the processed model (SD+ControlNet) at location "./models/control_sd15_ini.ckpt", you can just run:

python3 tool_add_control.py ./models/v1-5-pruned.ckpt ./models/control_sd15_ini.ckpt

# Or if you are using SD2:

python3 tool_add_control_sd21.py ./models/v2-1_512-ema-pruned.ckpt ./models/control_sd21_ini.ckpt

Step 2: Preparing datasets

Just download the Fill50K dataset from our huggingface page (training/fill50k.zip, the file is only 200M!). Make sure that the data is decompressed as

training/
└── fill50k
    ├── source
    └── target

In the folder "fill50k/source", you will have 50k images of circle lines.

In the folder "fill50k/target", you will have 50k images of filled circles.

In the "fill50k/prompt.json", you will have their filenames and prompts. Each prompt is like "a balabala color circle in some other color background."

Step 3: Training

# One GPU
python3 tutorial_train.py

# 8 GPUs
python3 tutorial_train_dist.py

Results

GPUs	FPS
BI-V100 x8	5.02 s/it

Go to ./image_log/train/ to check results of images.

Reference

ControlNet

2.8 KiB Raw Permalink Blame History