An easy to understand TTS / SVS / SVC training framework.
Check our Wiki to get started!
Using Diffusion Model to solve different voice generating tasks. Compared with the original diffsvc repository, the advantages and disadvantages of this repository are as follows:
The following commands need to be executed in the conda environment of python 3.10
# Install PyTorch related core dependencies, skip if installed
# Reference: https://pytorch.org/get-started/locally/
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
# Install Poetry dependency management tool, skip if installed
# Reference: https://python-poetry.org/docs/#installation
curl -sSL https://install.python-poetry.org | python3 -
# Install the project dependencies
poetry install
Fish Diffusion requires the OPENVPI 441khz NSF-HiFiGAN vocoder to generate audio.
python tools/download_nsf_hifigan.py
Download and unzip nsf_hifigan_20221211.zip
from 441khz vocoder
Copy the nsf_hifigan
folder to the checkpoints
directory (create if not exist)
You only need to put the dataset into the dataset
directory in the following file structure
dataset
├───train
│ ├───xxx1-xxx1.wav
│ ├───...
│ ├───Lxx-0xx8.wav
│ └───speaker0 (Subdirectory is also supported)
│ └───xxx1-xxx1.wav
└───valid
├───xx2-0xxx2.wav
├───...
└───xxx7-xxx007.wav
# 1. Extract all data features, such as pitch, text features, mel features, etc.
python tools/preprocessing/extract_features.py --config configs/svc_hubert_soft.py --path dataset --clean
# 2. Generate training set statistics
python tools/preprocessing/generate_stats.py --input-dir dataset/train --output-file dataset/stats.json
The project is under active development, please backup your config file
The project is under active development, please backup your config file
The project is under active development, please backup your config file
# Single machine single card / multi-card training
python train.py --config configs/svc_hubert_soft.py
# Resume training
python train.py --config configs/svc_hubert_soft.py --resume [checkpoint]
python inference.py --config configs/svc_hubert_soft.py \
--checkpoint [checkpoint] \
--input [input audio] \
--output [output audio]
python tools/diff_svc_converter.py --config configs/svc_hubert_soft_diff_svc.py \
--input-path [DiffSVC ckpt] \
--output-path [Fish Diffusion ckpt]
If you have any questions, please submit an issue or pull request.
You should run tools/lint.sh
before submitting a pull request.
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》