ControlNet
Model description
ControlNet is a neural network structure to control diffusion models by adding extra conditions, a game changer for AI Image generation. It brings unprecedented levels of control to Stable Diffusion. The revolutionary thing about ControlNet is its solution to the problem of spatial consistency.
This is simple: we want to control SD to fill a circle with colors, and the prompt contains some description of our target.
Stable diffusion is trained on billions of images, and it already knows what is "cyan", what is "circle", what is "pink", and what is "background".
But it does not know the meaning of that "Control Image (Source Image)". Our target is to let it know.
Step 1: Installation
pip3 install open_clip_torch transformers einops omegaconf
pip3 install pytorch-lightning==1.9.5
pip3 install urllib3==1.26
yum install -y mesa-libGL
- Build the Stable Difussion to control
You need to decide which Stable Diffusion Model you want to control. In this example, we will just use standard SD1.5. You can download it from the official page of Stability. You want the file "v1-5-pruned.ckpt". (Or "v2-1_512-ema-pruned.ckpt" if you are using SD2.)
# We provide a simple script for you to achieve this easily.
#If your SD filename is "./models/v1-5-pruned.ckpt" and you want the script to save the processed model (SD+ControlNet) at location "./models/control_sd15_ini.ckpt", you can just run:
python3 tool_add_control.py ./models/v1-5-pruned.ckpt ./models/control_sd15_ini.ckpt
# Or if you are using SD2:
python3 tool_add_control_sd21.py ./models/v2-1_512-ema-pruned.ckpt ./models/control_sd21_ini.ckpt
Step 2: Preparing datasets
Just download the Fill50K dataset from our huggingface page (training/fill50k.zip, the file is only 200M!). Make sure that the data is decompressed as
training/
└── fill50k
├── source
└── target
In the folder "fill50k/source", you will have 50k images of circle lines.
In the folder "fill50k/target", you will have 50k images of filled circles.
In the "fill50k/prompt.json", you will have their filenames and prompts. Each prompt is like "a balabala color circle in some other color background."
Step 3: Training
# One GPU
python3 tutorial_train.py
# 8 GPUs
python3 tutorial_train_dist.py
Results
GPUs |
FPS |
BI-V100 x8 |
5.02 s/it |
Go to ./image_log/train/
to check results of images.
Reference