关于GCU、沐曦GPGPU、MLU、0卡V100资源4月7日恢复上架的公告>>> 关于共建具身智能开源数据集的倡议>>> 关于云脑任务中统一路径访问方式的公告>>> 关于将启智集群GPU资源迁移至智算集群的公告>>>

4.1 KiB

Raw Permalink Blame History

Fish Diffusion

Fish Diffusion

An easy to understand TTS / SVS / SVC training framework.

Check our Wiki to get started!

Summary

Using Diffusion Model to solve different voice generating tasks. Compared with the original diffsvc repository, the advantages and disadvantages of this repository are as follows:

Support multi-speaker
The code structure of this repository is simpler and easier to understand, and all modules are decoupled
Support 441khz Diff Singer community vocoder
Support multi-machine multi-devices training, support half-precision training, save your training speed and memory

Preparing the environment

The following commands need to be executed in the conda environment of python 3.10

# Install PyTorch related core dependencies, skip if installed
# Reference: https://pytorch.org/get-started/locally/
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia

# Install Poetry dependency management tool, skip if installed
# Reference: https://python-poetry.org/docs/#installation
curl -sSL https://install.python-poetry.org | python3 -

# Install the project dependencies
poetry install

Vocoder preparation

Fish Diffusion requires the OPENVPI 441khz NSF-HiFiGAN vocoder to generate audio.

Automatic download

python tools/download_nsf_hifigan.py

Manual download

Download and unzip nsf_hifigan_20221211.zip from 441khz vocoder

Copy the nsf_hifigan folder to the checkpoints directory (create if not exist)

Dataset preparation

You only need to put the dataset into the dataset directory in the following file structure

dataset
├───train
│   ├───xxx1-xxx1.wav
│   ├───...
│   ├───Lxx-0xx8.wav
│   └───speaker0 (Subdirectory is also supported)
│       └───xxx1-xxx1.wav
└───valid
    ├───xx2-0xxx2.wav
    ├───...
    └───xxx7-xxx007.wav

# 1. Extract all data features, such as pitch, text features, mel features, etc.
python tools/preprocessing/extract_features.py --config configs/svc_hubert_soft.py --path dataset --clean

# 2. Generate training set statistics
python tools/preprocessing/generate_stats.py --input-dir dataset/train --output-file dataset/stats.json

Baseline training

The project is under active development, please backup your config file
The project is under active development, please backup your config file
The project is under active development, please backup your config file

# Single machine single card / multi-card training
python train.py --config configs/svc_hubert_soft.py

# Resume training
python train.py --config configs/svc_hubert_soft.py --resume [checkpoint]

Inference

python inference.py --config configs/svc_hubert_soft.py \
    --checkpoint [checkpoint] \
    --input [input audio] \
    --output [output audio]

Convert a DiffSVC model to Fish Diffusion

python tools/diff_svc_converter.py --config configs/svc_hubert_soft_diff_svc.py \
    --input-path [DiffSVC ckpt] \
    --output-path [Fish Diffusion ckpt]

Contributing

If you have any questions, please submit an issue or pull request.
You should run tools/lint.sh before submitting a pull request.

4.1 KiB

Raw Permalink Blame History

Fish Diffusion

Summary

Preparing the environment

Vocoder preparation

Automatic download

Manual download

Dataset preparation

Baseline training

Inference

Convert a DiffSVC model to Fish Diffusion

Contributing

Credits

Thanks to all contributors for their efforts

4.1 KiB Raw Permalink Blame History

Fish Diffusion

Summary

Preparing the environment

Vocoder preparation

Automatic download

Manual download

Dataset preparation

Baseline training

Inference

Convert a DiffSVC model to Fish Diffusion

Contributing

Credits

Thanks to all contributors for their efforts

4.1 KiB

Raw Permalink Blame History