huan 6c1e02dd8c | 7 months ago | |
---|---|---|
.. | ||
hparams | 1 year ago | |
infer | 1 year ago | |
modelart | 1 year ago | |
scripts | 10 months ago | |
src | 1 year ago | |
README.md | 7 months ago | |
README_CN.md | 7 months ago | |
data_prepare.sh | 1 year ago | |
docker_start.sh | 1 year ago | |
ecapa-tdnn_config.yaml | 1 year ago | |
eval.py | 1 year ago | |
eval_data_prepare.py | 1 year ago | |
export.py | 1 year ago | |
merge_data.py | 1 year ago | |
requirements.txt | 1 year ago | |
train.py | 1 year ago | |
train_data_prepare.py | 1 year ago |
ECAPA-TDNN is introduced in 2020,achieves best results in voxceleb1 evaluation trials. Comparing to vanilla tdnn,ECAPA-TDNN appends SE-block + Res2Block + Attentive Stat Pooling. By increasing the channels and enlarge the model size, the performance improve a lot.
paper:Brecht Desplanques, Jenthe Thienpondt, Kris Demuynck. Interspeech 2020.
ECAPA-TDNN consists of several SE-Res2Blocks。The 1d convolution component of SE-Res2Block and its dilation parameters are same as conventional tdnn。The SE-Res2Block consists of 1×1 Conv, 3×1 Conv, SE-block and res2net。
Download voxceleb1 and voxceleb2 dataset
Convert m4a to wav, you can use the script:https://gist.github.com/seungwonpark/4f273739beef2691cd53b5c39629d830
Merge the train set of voxceleb1 and voxceleb2 to wav/ folder. i.e. voxceleb12/wav/id*/*.wav
The directory structure is as follows:
voxceleb12
├── meta
└── wav
├── id10001
│ ├── 1zcIwhmdeo4
│ ├── 7gWzIy6yIIk
│ └── ...
├── id10002
│ ├── 0_laIeN-Q44
│ ├── 6WO410QOeuo
│ └── ...
├── ...
test on voxceleb1, copy trials to voxceleb1/veri_test2.txt
The directory structure of voxceleb1 is as follows:
voxceleb1
├──veri_test2.txt
└── wav
├── id10001
│ ├── 1zcIwhmdeo4
│ ├── 7gWzIy6yIIk
│ └── ...
├── id10002
│ ├── 0_laIeN-Q44
│ ├── 6WO410QOeuo
│ └── ...
├── ...
As mindspore do not support fbank feature extraction online, we need to do it offline. We augment the raw wav data and extract fbank feature out as the training data of mindspore script. We borrow feature extraction script from speechbrain, and make a bit of edition.
Run data_prepare.sh, it will cost several hours to do 5x augmentation, consume 1.3T disk space. To achieve target precision, we need 50x augmentation, which will cost 13T disk space.
bash data_prepare.sh
Then run script below, if you want to accelerate dataload io
python3 merge_data.py hparams/prepare_train.yaml
Note: you can see how to set parameters in Quick Start
The mixed precision training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data formats, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware.
For FP16 operators, if the input data type is FP32, the backend of MindSpore will automatically handle it with reduced precision. Users could check the reduced-precision operators by enabling INFO log and then searching ‘reduce precision’.
pip3 install -r requirements.txt
after install python3After installing MindSpore and python3, you can start training and evaluation as follows:
# Change data set path on prepare_train.yaml file
data_folder: /home/abc000/data/voxceleb12 # path to train set
feat_folder: /home/abc000/data/feat_train/ # path to store traing data for mindspore
# Change data set path on prepare_eval.yaml file
data_folder: /home/abc000/data/voxceleb1/ # path to test set
feat_eval_folder: /home/abc000/data/feat_eval/ # path to store test data for mindspore
feat_norm_folder: /home/abc000/data/feat_norm/ # path to store norm data for mindspore
# Change data set path on edapa-tdnn_config.yaml file
train_data_path: /home/abc000/data/feat_train/
# run training example
python train.py > train.log 2>&1 &
# For Ascend device, standalone training example(1p) by shell script
bash run_standalone_train_ascend.sh DEVICE_ID
# For Ascend device, distributed training example(8p) by shell script
bash run_distribute_train.sh RANK_TABLE_FILE
# run evaluation example
bash run_eval_ascend.sh DEVICE_ID PATH_CHECKPOINT
For distributed training, a hccl configuration file with JSON format needs to be created in advance.
Please follow the instructions in the link below:
https://gitee.com/mindspore/models/tree/master/utils/hccl_tools.
ModelZoo_ECAPA-TDNN
├── data_prepare.sh # to prepare training and eval data for mindspore
├── ecapa-tdnn_config.yaml # parameter for training and eval
├── eval_data_prepare.py # script to prepare eval data
├── eval.py # eval script
├── export.py # export mindir 310 model
├── hparams # parameter to prepare training and eval data for mindspore
├── README_CN.md # description file
├── README.md # description file
├── requirements.txt # python dependent packages
├── scripts # script for train and eval etc.
├── src # model related python script
├── train_data_prepare.py # script to prepare train data
├── merge_data.py # script to merge data to few batchs to accelerate data io
└── train.py # train script
change train feature extraction parameter on hparams/prepare_train.yaml
output_folder: ./augmented/ # path to store intermediate result
save_folder: !ref <output_folder>/save/
feat_folder: /home/abc000/data/feat_train/ # path to store training fbank feature
# Data files
data_folder: /home/abc000/data/voxceleb12 # pato to raw wav train data
train_annotation: !ref <save_folder>/train.csv # pre generated csv file, regenerate if not exist
valid_annotation: !ref <save_folder>/dev.csv # pre generated csv file, regenerate if not exist
change eval feature extraction parameter on hparams/prepare_train.yaml
output_folder: ./augmented_eval/ # path to store intermediate result
feat_eval_folder: /home/abc000/data/feat_eval/ # path to store eval related fbank feature
feat_norm_folder: /home/abc000/data/feat_norm/ # path to store norm related fbank feature
data_folder: /home/abc000/data/voxceleb1/ # path to voxceleb1
save_folder: !ref <output_folder>/save/ # path to store intermediate result
Parameters for both training and evaluation can be set in edapa-tdnn_config.yaml
config ECAPA-TDNN and dataset
inChannels: 80 # input channel size, same as the dim of fbank feature
channels: 1024 # channel size of middle layer feature map
base_lrate: 0.000001 # base learning rate of cyclic LR
max_lrate: 0.0001 # max learning rate of cyclic LR
momentum: 0.95 # weight decay for optimizer
weightDecay: 0.000002 # momentum for optimizer
num_epochs: 3 # training epoch
minibatch_size: 192 # batch size
emb_size: 192 # embedding dim
step_size: 65000 # steps to achieve max learning rate cyclic LR
CLASS_NUM: 7205 # speaker num pf voxceleb1&2
pre_trained: False # if pre-trained model exist
train_data_path: "/home/abc000/data/feat_train/" # path to fbank training data
keep_checkpoint_max: 30 # max model number to save
checkpoint_path: "/ckpt/train_ecapa_vox2_full-2_664204.ckpt" # path to pre-trained model
ckpt_save_dir: "./ckpt/" # path to store train model
# eval
eval_data_path: "/home/abc000/data/feat_eval/" # path to eval fbank data
veri_file_path: "veri_test_bleeched.txt" # trials
model_path: "ckpt/train_ecapa_vox2_full-2_664204.ckpt" # path of eval model
score_norm: "s-norm" # if do norm
train_norm_path: "/data/dataset/feat_norm/" # fbank data for norm
more detail please refer edapa-tdnn_config.yaml
python3 train.py > train.log 2>&1 &
OR bash scripts/run_standalone_train_ascend.sh
The python command above will run in the background, you can view the results through the file train.log
. After training, you'll get some checkpoint files under the script folder by default. The loss value will be achieved as follows:
# grep "loss: " train.log
2022-02-13 13:58:33.898547, epoch:0/15, iter-719000/731560, aver loss:1.5836, cur loss:1.1664, acc_aver:0.7349
2022-02-13 14:08:44.639722, epoch:0/15, iter-720000/731560, aver loss:1.5797, cur loss:1.1057, acc_aver:0.7363
...
The model checkpoint will be saved in the ckpt_save_dir.
bash scripts/run_distribute_train_ascend.sh
The above shell script will run distribute training in the background. You can view the results through the file train_parallel[X]/log
. The loss value will be achieved as follows:
# grep "loss: " train.log
2022-02-13 13:58:33.898547, epoch:0/15, iter-719000/731560, aver loss:1.5836, cur loss:1.1664, acc_aver:0.7349
2022-02-13 14:08:44.639722, epoch:0/15, iter-720000/731560, aver loss:1.5797, cur loss:1.1057, acc_aver:0.7363
...
...
evaluation on voxceleb1 dataset when running on Ascend
Before running the command below, please check the checkpoint path used for evaluation. Please set the checkpoint path to be the absolute full path, e.g., "/username/xxx/saved_model/xxx_20-215_176000.ckpt".
bash run_eval_ascend.sh DEVICE_ID PATH_CHECKPOINT
The above python command will run in the background. You can view the result in eval.log:
# grep "eer" eval.log
eer xxx:0.0082
we can also set the 'cut_wav' parameter in ecapa-tdnn_config.yaml to get the eer of 3s wav which is same as the 310 inference result.
wav_cut: true # if cut the test wav to 3s(same as train data), default false
python3 export.py
Parameters | Ascend |
---|---|
Model Version | ECAPA-TDNN |
Resource | Ascend 910;CPU 2.60GHz,56cores;Memory 755G; OS Euler2.8 |
uploaded Date | 3/1/2022 |
MindSpore Version | 1.5.1 |
Dataset | voxceleb1&voxceleb2 |
Training Parameters | epoch=3, steps=733560*epoch, batch_size = 192, min_lr=0.000001, max_lr=0.0001 |
Optimizer | Adam |
Loss Function | AAM-Softmax |
outputs | probability |
Speed | 1pc: 17 steps/sec |
Total time | 1pc: 264 hours |
Loss | 1.1 |
Parameters (M) | 13.0 |
Checkpoint for Fine tuning | 254 (.ckpt file) |
infer model | 76.60M(.mindir文件) |
Parameters | Ascend |
---|---|
Model Version | ECAPA-TDNN |
Resource | Ascend 910;OS Euler2.8 |
uploaded Date | 2022-03-01 |
MindSpore Version | 1.5.1 |
Dataset | voxceleb1-eval, 4715 wavs |
batch_size | 1 |
outputs | probability |
accuracy | 1p: EER=0.82% |
infer model | 76.60M(.mindir) |
Please check the official homepage.
Models of MindSpore
Python Shell Unity3D Asset C++ Markdown other
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》