GPT-2/[3] is a language generation model obtained by pre-training on a large-scale unlabeled text corpus using the Transformer decoder as the basic component of the network.
git clone https://github.com/PaddlePaddle/PaddleNLP.git
cd PaddleNLP/model_zoo/gpt
pip3 install -r requirements.txt
pip3 install paddlenlp
Take the SST-2 task in GLUE as an example, SST-2 (The Stanford Sentiment Treebank, Stanford Sentiment Treebank), a single sentence classification task, contains human annotations of sentences in movie reviews and their emotions. This task is to give the emotion of the sentence, the category is divided into two types of positive emotion (positive, the sample label corresponds to 1) and negative emotion (negative, the sample label corresponds to 0), and only sentence-level labels are used. That is, this task is also a binary classification task, which is divided into positive and negative emotions for the sentence level.Number of samples: 67,350 training sets, 873 development sets, and 1,821 test sets. Task: Sentiment Classification, Binary Classification of Positive and Negative Sentiments.
wget https://dl.fbaipublicfiles.com/glue/data/SST-2.zip
makdir -p dataset
unzip SST-2.zip -d dataset
The SST-2 dataset path structure should look like:
dataset/SST-2
├── original
│ ├── datasetSentences.txt
│ ├── datasetSplit.txt
│ ├── dictionary.txt
│ └── original_rt_snippets.txt
│ └── REANDME.txt
│ └── sentiment_labels.txt
│ └── SOStr.txt
│ └── STree.txt
├── dev.tsv
├── test.tsv
├── train.tsv
# 1 GPU
export CUDA_VISIBLE_DEVICES=0
python3 run_glue.py \
--model_name_or_path gpt2-medium-en \
--task_name SST-2 \
--max_seq_length 128 \
--per_device_train_batch_size 32 \
--learning_rate 2e-5 \
--num_train_epochs 3 \
--logging_steps 1 \
--save_steps 500 \
--output_dir ./output_dir/glue \
--eval_steps 1 \
--device gpu \
--do_train true
# 8 GPUs
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python3 -m paddle.distributed.launch --gpus "0,1,2,3,4,5,6,7" run_glue.py \
--model_name_or_path gpt2-medium-en \
--task_name SST-2 \
--max_seq_length 128 \
--per_device_train_batch_size 32 \
--learning_rate 2e-5 \
--num_train_epochs 3 \
--logging_steps 1 \
--save_steps 500 \
--output_dir ./output_dir/glue \
--eval_steps 1 \
--device gpu \
--do_train true
The parameters in the configuration file are explained as follows:
model_type
Indicates the model type。model_name_or_path
Indicates a model with a specific configuration, corresponding to its pre-trained model and the tokenizer used for pre-training. If the relevant content of the model is saved locally, the corresponding directory address can also be provided here.task_name
Indicates the task of Fine-tuning.max_seq_length
Indicates the maximum sentence length beyond which it will be truncated.batch_size
Indicates the number of samples on each card per iteration.learning_rate
Indicates the size of the basic learning rate, which will be multiplied by the value generated by the learning rate scheduler as the current learning rate.num_train_epochs
Indicates the number of training rounds.logging_steps
Indicates the log printing interval.save_steps
Indicates the model save and evaluation interval.output_dir
Indicates the path to save the model.device
Indicates the equipment used for training, 'gpu' means to use GPU, 'xpu' means to use Baidu Kunlun card, 'cpu' means to use CPU, 'npu' means to use Huawei Ascend card.use_amp
Indicates whether automatic mixed-precision training is enabled.run_eval_sst2.py
file and put it in the PaddleNLP/model_zoo/gpt
folder, run_eval_sst2.py
is modified based on the PaddleNLP/model_zoo/gpt/run_eval.py
python3 run_eval_sst2.py --model_name gpt2-medium-en \
--eval_path dataset/SST-2/dev.tsv \
--cloze_eval \
--init_checkpoint_path ./output_dir/glue/checkpoint-500/model_state.pdparams \
--batch_size 1 \
--device gpu
GPUs | FPS | Accuracy |
---|---|---|
BI-V100 | 221 | 0.92087 |
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》