GPT2-Medium-EN

Model introduction

GPT-2/[3] is a language generation model obtained by pre-training on a large-scale unlabeled text corpus using the Transformer decoder as the basic component of the network.

Step 1: Installation

git clone https://github.com/PaddlePaddle/PaddleNLP.git
cd PaddleNLP/model_zoo/gpt
pip3 install -r requirements.txt
pip3 install paddlenlp

Step 2: Preparing dataset

Take the SST-2 task in GLUE as an example, SST-2 (The Stanford Sentiment Treebank, Stanford Sentiment Treebank), a single sentence classification task, contains human annotations of sentences in movie reviews and their emotions. This task is to give the emotion of the sentence, the category is divided into two types of positive emotion (positive, the sample label corresponds to 1) and negative emotion (negative, the sample label corresponds to 0), and only sentence-level labels are used. That is, this task is also a binary classification task, which is divided into positive and negative emotions for the sentence level.Number of samples: 67,350 training sets, 873 development sets, and 1,821 test sets. Task: Sentiment Classification, Binary Classification of Positive and Negative Sentiments.

wget https://dl.fbaipublicfiles.com/glue/data/SST-2.zip
makdir -p dataset
unzip SST-2.zip -d dataset

The SST-2 dataset path structure should look like:

dataset/SST-2
├── original
│   ├── datasetSentences.txt
│   ├── datasetSplit.txt
│   ├── dictionary.txt
│   └── original_rt_snippets.txt
│   └── REANDME.txt
│   └── sentiment_labels.txt
│   └── SOStr.txt
│   └── STree.txt
├── dev.tsv
├── test.tsv
├── train.tsv

Step 3:Training

# 1 GPU
export CUDA_VISIBLE_DEVICES=0
python3  run_glue.py \
  --model_name_or_path gpt2-medium-en \
  --task_name SST-2 \
  --max_seq_length 128 \
  --per_device_train_batch_size 32   \
  --learning_rate 2e-5 \
  --num_train_epochs 3 \
  --logging_steps 1 \
  --save_steps 500 \
  --output_dir ./output_dir/glue \
  --eval_steps 1 \
  --device gpu \
  --do_train true

# 8 GPUs
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python3 -m paddle.distributed.launch --gpus "0,1,2,3,4,5,6,7" run_glue.py \
  --model_name_or_path gpt2-medium-en \
  --task_name SST-2 \
  --max_seq_length 128 \
  --per_device_train_batch_size 32   \
  --learning_rate 2e-5 \
  --num_train_epochs 3 \
  --logging_steps 1 \
  --save_steps 500 \
  --output_dir ./output_dir/glue \
  --eval_steps 1 \
  --device gpu \
  --do_train true

The parameters in the configuration file are explained as follows：

model_type Indicates the model type。
model_name_or_path Indicates a model with a specific configuration, corresponding to its pre-trained model and the tokenizer used for pre-training. If the relevant content of the model is saved locally, the corresponding directory address can also be provided here.
task_name Indicates the task of Fine-tuning.
max_seq_length Indicates the maximum sentence length beyond which it will be truncated.
batch_size Indicates the number of samples on each card per iteration.
learning_rate Indicates the size of the basic learning rate, which will be multiplied by the value generated by the learning rate scheduler as the current learning rate.
num_train_epochs Indicates the number of training rounds.
logging_steps Indicates the log printing interval.
save_steps Indicates the model save and evaluation interval.
output_dir Indicates the path to save the model.
device Indicates the equipment used for training, 'gpu' means to use GPU, 'xpu' means to use Baidu Kunlun card, 'cpu' means to use CPU, 'npu' means to use Huawei Ascend card.
use_amp Indicates whether automatic mixed-precision training is enabled.

Step 4: Model Evaluation

Add the run_eval_sst2.py file and put it in the PaddleNLP/model_zoo/gpt folder, run_eval_sst2.py is modified based on the PaddleNLP/model_zoo/gpt/run_eval.py
execute the following command to evaluate:

python3 run_eval_sst2.py --model_name gpt2-medium-en \
    --eval_path dataset/SST-2/dev.tsv \
    --cloze_eval \
    --init_checkpoint_path ./output_dir/glue/checkpoint-500/model_state.pdparams \
    --batch_size 1 \
    --device gpu

Results

GPUs	FPS	Accuracy
BI-V100	221	0.92087

Reference

PaddleNlp

4.6 KiB Raw Permalink Blame History