XLNet
Model description
XLNet: Generalized Autoregressive Pretraining for Language Understanding is an unsupervised auto-regressive pre-trained language model. Different from traditional one-way auto-regressive models, XLNet performs language modeling by maximizing the expectation of all permutations of the input sequence, which allows it to pay attention to contextual information at the same time. In addition, XLNet integrates the Transformer-XL model in the pre-training stage, the Segment Recurrent Mechanism and Relative Positional Encoding mechanism in Transformer-XL can support XLNet to accept longer input sequences, which makes XLNet have excellent performance in language tasks with long text sequences.
Step 1: Installation
pip3 install sentencepiece
pip3 install urllib3==1.26.6
pip3 install paddlenlp==2.4.1
Step 2: Preparing datasets
The dataset included in the GLUE evaluation task has been provided in the form of API in PaddleNLP, no preparation is required in advance. It will be automatically downloaded when executing using run_glue.py
.
Step 3: Training
Taking the SST-2 task in GLUE as an example, the method of starting Fine-tuning is as follows:
# Set --gpus to be "0" if run on 1 GPU
python3 -m paddle.distributed.launch --gpus "0,1,2,3,4,5,6,7" ./run_glue.py \
--model_name_or_path xlnet-base-cased \
--task_name SST-2 \
--max_seq_length 128 \
--batch_size 32 \
--learning_rate 2e-5 \
--num_train_epochs 3 \
--logging_steps 100 \
--save_steps 500 \
--output_dir ./tmp/
The parameters are explained as follows:
model_name_or_path
indicates a model with a specific configuration, corresponding to its pre-trained model and the tokenizer used in pre-training. If the model-related content is saved locally, the corresponding directory address can also be provided here.
task_name
indicates the task of Fine-tuning.
max_seq_length
indicates the maximum sentence length, beyond which it will be truncated.
batch_size
represents the number of samples per card per iteration.
learning_rate
indicates the size of the basic learning rate, which is multiplied with the value generated by the learning rate scheduler as the current learning rate.
num_train_epochs
indicates the number of training rounds.
logging_steps
indicates the log printing interval.
save_steps
indicates the model saving and evaluation interval.
output_dir
indicates the model saving path.
Results
GPUs |
FPS |
ACC |
BI-V100 x8 |
743.7 |
0.9450 |
Reference