Samit 33296a813a | 1 month ago | |
---|---|---|
.. | ||
ad | 1 month ago | |
configs | 2 months ago | |
models | 4 months ago | |
scripts | 2 months ago | |
tests | 1 month ago | |
tools | 2 months ago | |
README.md | 2 months ago | |
args_train.py | 2 months ago | |
requirements.txt | 2 months ago | |
text_to_video.py | 2 months ago | |
train.py | 2 months ago |
This repository is the MindSpore implementation of AnimateDiff.
pip install -r requirements.txt
In case decord
package is not available, try pip install eva-decord
.
For EulerOS, instructions on ffmpeg and decord installation are as follows.
1. install ffmpeg 4, referring to https://ffmpeg.org/releases
wget https://ffmpeg.org/releases/ffmpeg-4.0.1.tar.bz2 --no-check-certificate
tar -xvf ffmpeg-4.0.1.tar.bz2
mv ffmpeg-4.0.1 ffmpeg
cd ffmpeg
./configure --enable-shared # --enable-shared is needed for sharing libavcodec with decord
make -j 64
make install
2. install decord, referring to https://github.com/dmlc/decord?tab=readme-ov-file#install-from-source
git clone --recursive https://github.com/dmlc/decord
cd decord
rm build && mkdir build && cd build
cmake .. -DUSE_CUDA=0 -DCMAKE_BUILD_TYPE=Release
make -j 64
make install
cd ../python
python3 setup.py install --user
First, download the torch pretrained weights referring to torch animatediff checkpoints.
To download ToonYou-Beta3 dreambooth model, please refer to this civitai website, or use the following command:
wget https://civitai.com/api/download/models/78755 -P models/torch_ckpts/ --content-disposition --no-check-certificate
After downloading this dreambooth checkpoint under animatediff/models/torch_ckpts/
, convert the dreambooth checkpoint using:
cd ../examples/stable_diffusion_v2
python tools/model_conversion/convert_weights.py --source ../animatediff/models/torch_ckpts/toonyou_beta3.safetensors --target models/toonyou_beta3.ckpt --model sdv1 --source_version pt
In addition, please download RealisticVision V5.1 dreambooth checkpoint and convert it similarly.
cd ../examples/animatediff/tools
python motion_module_convert.py --src ../torch_ckpts/mm_sd_v15_v2.ckpt --tar ../models/motion_module
If converting the animatediff v3 motion module checkpoint,
cd ../examples/animatediff/tools
python motion_module_convert.py -v v3 --src ../torch_ckpts/v3_sd15_mm.ckpt --tar ../models/motion_module
cd ../examples/animatediff/tools
python motion_lora_convert.py --src ../torch_ckpts/.ckpt --tar ../models/motion_lora
cd ../examples/animatediff/tools
python domain_adapter_lora_convert.py --src ../torch_ckpts/v3_sd15_adapter.ckpt --tar ../models/domain_adapter_lora
cd ../examples/animatediff/tools
python sparsectrl_encoder_convert.py --src ../torch_ckpts/v3_sd15_sparsectrl_{}.ckpt --tar ../models/sparsectrl_encoder
The full tree of expected checkpoints is shown below:
models
├── domain_adapter_lora
│ └── v3_sd15_adapter.ckpt
├── dreambooth_lora
│ ├── realisticVisionV51_v51VAE.ckpt
│ └── toonyou_beta3.ckpt
├── motion_lora
│ └── v2_lora_ZoomIn.ckpt
├── motion_module
│ ├── mm_sd_v15.ckpt
│ ├── mm_sd_v15_v2.ckpt
│ └── v3_sd15_mm.ckpt
├── sparsectrl_encoder
│ ├── v3_sd15_sparsectrl_rgb.ckpt
│ └── v3_sd15_sparsectrl_scribble.ckpt
└── stable_diffusion
└── sd_v1.5-d0ab7146.ckpt
# download demo images
bash scripts/download_demo_images.sh
# under general T2V setting
python text_to_video.py --config configs/prompts/v3/v3-1-T2V.yaml
# image animation (on RealisticVision)
python text_to_video.py --config configs/prompts/v3/v3-2-animation-RealisticVision.yaml
# sketch-to-animation and storyboarding (on RealisticVision)
python text_to_video.py --config configs/prompts/v3/v3-3-sketch-RealisticVision.yaml
Results:
Input (by RealisticVision) | Animation | Input | Animation |
Input Scribble | Output | Input Scribbles | Output |
Please append --device_target GPU
to the end of the commands above.
If you use the checkpoint converted from torch for inference, please also append --vae_fp16=False
to the command above.
python text_to_video.py --config configs/prompts/v2/1-ToonYou.yaml --L 16 --H 512 --W 512
By default, DDIM sampling is used, and the sampling speed is 1.07s/iter.
Results:
python text_to_video.py --config configs/prompts/v2/1-ToonYou.yaml --L 16 --H 256 --W 256 --device_target GPU
If you use the checkpoint converted from torch for inference, please also append --vae_fp16=False
to the command above.
python text_to_video.py --config configs/prompts/v2/1-ToonYou-MotionLoRA.yaml --L 16 --H 512 --W 512
By default, DDIM sampling is used, and the sampling speed is 1.07s/iter.
Results using Zoom-In motion lora:
python text_to_video.py --config configs/prompts/v2/1-ToonYou-MotionLoRA.yaml --L 16 --H 256 --W 256 --device_target GPU
python train.py --config configs/training/image_finetune.yaml
For 910B, please set
export MS_ASCEND_CHECK_OVERFLOW_MODE="INFNAN_MODE"
before running training.
python train.py --config configs/training/mmv2_train.yaml
For 910B, please set
export MS_ASCEND_CHECK_OVERFLOW_MODE="INFNAN_MODE"
before running training.
You may change the arguments including data path, output directory, lr, etc in the yaml config file. You can also change by command line arguments referring to args_train.py
or python train.py --help
To infer with the trained model, run
python text_to_video.py --config configs/prompts/v2/base_video.yaml \
--motion_module_path {path to saved checkpoint} \
--prompt {text prompt} \
You can also create a new config yaml to specify the prompts to test and the motion moduel path based on configs/prompt/v2/base_video.yaml
.
Here are some generation results after MM training on 512x512 resolution and 16-frame data.
Disco light leaks disco ball light reflections shaped rectangular and line with motion blur effect | Cloudy moscow kremlin time lapse | Sharp knife to cut delicious smoked fish | A baker turns freshly baked loaves of sourdough bread |
Min-SNR weighting can be used to improve diffusion training convergence. You can enable it by appending --snr_gamma=5.0
to the training command.
python train.py --config configs/training/mmv2_lora.yaml
For 910B, please set
export MS_ASCEND_CHECK_OVERFLOW_MODE="INFNAN_MODE"
before running training.
To infer with the trained model, run
python text_to_video.py --config configs/prompts/v2/base_video.yaml \
--motion_lora_path {path to saved checkpoint} \
--prompt {text prompt} \
Here are some generation results after lora fine-tuning on 512x512 resolution and 16-frame data.
Disco light leaks disco ball light reflections shaped rectangular and line with motion blur effect | Cloudy moscow kremlin time lapse | Sharp knife to cut delicious smoked fish | A baker turns freshly baked loaves of sourdough bread |
Please add --device_target GPU
in the above training commands and adjust image_size
/num_frames
/train_batch_size
to fit your device memory. Below is an example for 3090.
# reduce num frames and batch size to avoid OOM in 3090
python train.py --config configs/training/mmv2_train.yaml --data_path ../videocomposer/datasets/webvid5 --image_size 256 --num_frames=4 --device_target GPU --train_batch_size=1
Model | Context | Scheduler | Steps | Resolution | Frame | Speed (step/s) | Time(s/video) |
---|---|---|---|---|---|---|---|
AnimateDiff v2 | D910*x1-MS2.2.10 | DDIM | 30 | 512x512 | 16 | 1.2 | 25 |
Context: {Ascend chip}-{number of NPUs}-{mindspore version}.
Model | Context | Task | Local BS x Grad. Accu. | Resolution | Frame | Step T. (s/step) |
---|---|---|---|---|---|---|
AnimateDiff v2 | D910*x1-MS2.2.10 | MM training | 1x1 | 512x512 | 16 | 1.29 |
AnimateDiff v2 | D910*x1-MS2.2.10 | Motion Lora | 1x1 | 512x512 | 16 | 1.26 |
AnimateDiff v2 | D910*x1-MS2.2.10 | MM training w/ Embed. cached | 1x1 | 512x512 | 16 | 0.75 |
AnimateDiff v2 | D910*x1-MS2.2.10 | Motion Lora w/ Embed. cached | 1x1 | 512x512 | 16 | 0.71 |
Context: {Ascend chip}-{number of NPUs}-{mindspore version}.
MM training: Motion Module training
Embed. cached: The video embedding (VAE-encoder outputs) and text embedding are pre-computed and stored before diffusion training.
one for all, Optimal generator with No Exception
https://github.com/mindspore-lab/mindone
Python Text Markdown Shell
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》