This repository implements tuning of the ChatGLM-6B model based on P-Tuning v2. P-Tuning v2 reduces the amount of parameters that need to be optimized to 0.1% of the full fine-tuning, and then through model quantization, Gradient Checkpoint and other methods, it only needs a minimum of 7GB of video memory to run.
The following uses the ADGEN (advertising generation) dataset as an example to introduce how to use the code.
Running p-tuning requires version 4.27.1 of transformers
. In addition to the dependencies of ChatGLM-6B, the following dependencies are required
pip install rouge_chinese nltk jieba datasets
The task of the ADGEN dataset is to generate an advertisement word (summary) based on the input (content).
{
"content": "类型#上衣*版型#宽松*版型#显瘦*图案#线条*衣样式#衬衫*衣袖型#泡泡袖*衣款式#抽绳",
"summary": "这件衬衫的款式非常的宽松,利落的线条可以很好的隐藏身材上的小缺点,穿在身上有着很好的显瘦效果。领口装饰了一个可爱的抽绳,漂亮的绳结展现出了十足的个性,配合时尚的泡泡袖型,尽显女性甜美可爱的气息。"
}
From Google Drive or Tsinghua Cloud download the processed ADGEN dataset, and put the decompressed AdvertiseGen
directory into this directory.
Run the following commands for training:
bash train.sh
PRE_SEQ_LEN
and LR
in train.sh
are soft prompt length and training learning rate respectively, which can be adjusted to achieve the best results. The P-Tuning-v2 method will freeze all model parameters, and the quantization level of the original model can be adjusted by adjusting quantization_bit
. If this option is not added, it will be loaded with FP16 precision.
Under the default configuration of per_device_train_batch_size=1
, gradient_accumulation_steps=16
, the model parameters of INT4 are frozen, and a training iteration will perform 16 cumulative forward and backward propagations with a batch size of 1, which is equivalent to the total batch size of 16, and only 6.7G GPU memory is required at this time with quantization_bit=4
. If you want to improve the training efficiency under the same batch size, you can increase the value of per_device_train_batch_size
while keeping the product of the two unchanged, but it will also bring more GPU memory consumption, please adjust it according to the actual situation.
If you want to load the model locally, you can change THUDM/chatglm-6b
in train.sh
to your local model path.
To finetune the full parameters, you need to install Deepspeed, and then run the following command:
bash ds_train_finetune.sh
During P-tuning v2 training, the model only saves the parameters of the PrefixEncoder part, so the original ChatGLM-6B model and the weight of the PrefixEncoder need to be loaded at the same time during inference, and the arguments need to be specified in evaluate.sh
:
--model_name_or_path THUDM/chatglm-6b
--ptuning_checkpoint $CHECKPOINT_PATH
It is still compatible with the old version of Checkpoint saved with full parameters, just set model_name_or_path
as before:
--model_name_or_path $CHECKPOINT_PATH
The evaluation indicators are Chinese Rouge score and BLEU-4. The generated results are saved in
./output/adgen-chatglm-6b-pt-8-1e-2/generated_predictions.txt
.
Finetune | P-tuning v2 | LoRA | |
---|---|---|---|
BLEU-4 | 8.01 | 8.10 | 7.62 |
Rouge-1 | 31.23 | 31.12 | 30.60 |
Rouge-2 | 7.36 | 7.11 | 6.96 |
Rouge-l | 25.08 | 24.97 | 24.80 |
Training Loss | 3.00 | 3.74 | 3.32 |
max_source_length=64
max_target_length=64
max_steps=3000
pre_seq_len=128
learning_rate=2e-2
quantization_bit=4
per_device_train_batch_size=16
gradient_accumulation_steps=1
learning_rate=1e-4
fp16
num_gpus=4
per_device_train_batch_size=4
gradient_accumulation_steps=1
The implementation uses simple_thu_chatglm6b
learning_rate=5e-4
per_device_train_batch_size=16
gradient_accumulation_steps=1
First load the tokenizer:
from transformers import AutoConfig, AutoModel, AutoTokenizer
# Load Tokenizer
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
config = AutoConfig.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True, pre_seq_len=128)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", config=config, trust_remote_code=True)
prefix_state_dict = torch.load(os.path.join(CHECKPOINT_PATH, "pytorch_model.bin"))
new_prefix_state_dict = {}
for k, v in prefix_state_dict.items():
if k.startswith("transformer.prefix_encoder."):
new_prefix_state_dict[k[len("transformer.prefix_encoder."):]] = v
model.transformer.prefix_encoder.load_state_dict(new_prefix_state_dict)
Note that you may need to change pre_seq_len
to the actual value of your training. If you load model from local, you need to change THUDM/chatglm-6b
to the local model path (not the checkpoint path).
model = AutoModel.from_pretrained(CHECKPOINT_PATH, trust_remote_code=True)
Then it can be quantified according to the needs, or it can be used directly:
# Comment out the following line if you don't use quantization
model = model. quantize(4)
model = model.half().cuda()
model.transformer.prefix_encoder.float()
model = model.eval()
response, history = model.chat(tokenizer, "Hello", history=[])
[23/04/19] You can also directly run web demo which supports loading P-Tuning v2 checkpoint
bash web_demo.sh
It may be necessary to modify the content of web_demo.sh to match your actual checkpoint situation.
Modify train_file
, validation_file
and test_file
in train.sh
and evaluate.sh
to your own JSON format dataset paths, and change prompt_column
and response_column
to the keys in the JSON file corresponding to input text and output text.
You may also need to increase max_source_length
and max_target_length
to match the maximum input and output lengths in your own dataset.
If you need to use multiple rounds of dialogue data to train the model, you can provide chat history. For example, the following is the training data for a three-round dialogue:
{"prompt": "长城h3风扇不转。继电器好的。保险丝好的传感器新的风扇也新的这是为什么。就是继电器缺一个信号线", "response": "用电脑能读数据流吗?水温多少", "history": []}
{"prompt": "95", "response": "上下水管温差怎么样啊?空气是不是都排干净了呢?", "history": [["长城h3风扇不转。继电器好的。保险丝好的传感器新的风扇也新的这是为什么。就是继电器缺一个信号线", "用电脑能读数据流吗?水温多少"]]}
{"prompt": "是的。上下水管都好的", "response": "那就要检查线路了,一般风扇继电器是由电脑控制吸合的,如果电路存在断路,或者电脑坏了的话会出现继电器不吸合的情况!", "history": [["长城h3风扇不转。继电器好的。保险丝好的传感器新的风扇也新的这是为什么。就是继电器缺一个信号线", "用电脑能读数据流吗?水温多少"], ["95", "上下水管温差怎么样啊?空气是不是都排干净了呢?"]]}
During training, you need to specify --history_column
as the key of the chat history in the data (history
in this example), and the chat history will be stitched automatically. Note that content exceeding the input length max_source_length
will be truncated.
You can refer to the following instructions:
bash train_chat.sh
@inproceedings{liu2022p,
title={P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks},
author={Liu, Xiao and Ji, Kaixuan and Fu, Yicheng and Tam, Weng and Du, Zhengxiao and Yang, Zhilin and Tang, Jie},
booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)},
pages={61--68},
year={2022}
}
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》