BERT
Model description
The BERT network was proposed by Google in 2018. The network has made a breakthrough in the field of NLP. The network uses pre-training to achieve a large network structure without modifying, and only by adding an output layer to achieve multiple text-based tasks in fine-tuning. The backbone code of BERT adopts the Encoder structure of Transformer. The attention mechanism is introduced to enable the output layer to capture high-latitude global semantic information. The pre-training uses denoising and self-encoding tasks, namely MLM(Masked Language Model) and NSP(Next Sentence Prediction). No need to label data, pre-training can be performed on massive text data, and only a small amount of data to fine-tuning downstream tasks to obtain good results. The pre-training plus fune-tuning mode created by BERT is widely adopted by subsequent NLP networks.
Paper: Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805.
Paper: Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen, Qun Liu. NEZHA: Neural Contextualized Representation for Chinese Language Understanding. arXiv preprint arXiv:1909.00204.
Step 1: Installing
pip3 install -r requirements.txt
Step 2: Prepare Datasets
1. Download training dataset(.tf_record), eval dataset(.json), vocab.txt and checkpoint:bert_large_ascend_v130_enwiki_official_nlp_bs768_loss1.1.ckpt
cd scripts
mkdir -p squad
Please BERT download vocab.txt here
- Create fine-tune dataset
- Download dataset for fine-tuning and evaluation such as Chinese Named Entity RecognitionCLUENER, Chinese sentences classificationTNEWS, Chinese Named Entity RecognitionChineseNER, English question and answeringSQuAD v1.1 train dataset, SQuAD v1.1 eval dataset, package of English sentences classificationGLUE.
- We haven't provide the scripts to create tfrecord yet, while converting dataset files from JSON format to TFRECORD format, please refer to run_classifier.py or run_squad.py file in BERT repository or the CLUE official repository CLUE and CLUENER
We have provided several kinds of pretrained checkpoint.
Step 3: Training
bash scripts/run_squad_gpu_distribute.sh 8
[Evaluation result]
Results on BI-V100
GPUs |
per step time |
exact_match |
F1 |
1*8 |
1.898s |
71.9678 |
81.422 |
性能数据:NV
Results on NV-V100s
GPUs |
per step time |
exact_match |
F1 |
1*8 |
1.877s |
71.9678 |
81.422 |