XLNet

Model description

XLNet: Generalized Autoregressive Pretraining for Language Understanding is an unsupervised auto-regressive pre-trained language model. Different from traditional one-way auto-regressive models, XLNet performs language modeling by maximizing the expectation of all permutations of the input sequence, which allows it to pay attention to contextual information at the same time. In addition, XLNet integrates the Transformer-XL model in the pre-training stage, the Segment Recurrent Mechanism and Relative Positional Encoding mechanism in Transformer-XL can support XLNet to accept longer input sequences, which makes XLNet have excellent performance in language tasks with long text sequences.

Step 1: Installation

pip3 install sentencepiece
pip3 install urllib3==1.26.6
pip3 install paddlenlp==2.4.1

Step 2: Preparing datasets

The dataset included in the GLUE evaluation task has been provided in the form of API in PaddleNLP, no preparation is required in advance. It will be automatically downloaded when executing using run_glue.py.

Step 3: Training

Taking the SST-2 task in GLUE as an example, the method of starting Fine-tuning is as follows:

# Set --gpus to be "0" if run on 1 GPU
python3 -m paddle.distributed.launch --gpus "0,1,2,3,4,5,6,7" ./run_glue.py \
    --model_name_or_path xlnet-base-cased \
    --task_name SST-2 \
    --max_seq_length 128 \
    --batch_size 32 \
    --learning_rate 2e-5 \
    --num_train_epochs 3 \
    --logging_steps 100 \
    --save_steps 500 \
    --output_dir ./tmp/

The parameters are explained as follows:

model_name_or_path indicates a model with a specific configuration, corresponding to its pre-trained model and the tokenizer used in pre-training. If the model-related content is saved locally, the corresponding directory address can also be provided here.
task_name indicates the task of Fine-tuning.
max_seq_length indicates the maximum sentence length, beyond which it will be truncated.
batch_size represents the number of samples per card per iteration.
learning_rate indicates the size of the basic learning rate, which is multiplied with the value generated by the learning rate scheduler as the current learning rate.
num_train_epochs indicates the number of training rounds.
logging_steps indicates the log printing interval.
save_steps indicates the model saving and evaluation interval.
output_dir indicates the model saving path.

Results

GPUs	FPS	ACC
BI-V100 x8	743.7	0.9450

2.9 KiB Raw Permalink Blame History