Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
IANZHU f54bd6463a | 1 year ago | |
---|---|---|
.idea | 1 year ago | |
configs | 1 year ago | |
dialogue_dir | 1 year ago | |
reward_data_dir/processed | 1 year ago | |
reward_model | 1 year ago | |
sft | 1 year ago | |
README.md | 1 year ago | |
dataprocess.py | 1 year ago | |
requirements.txt | 1 year ago | |
trainPanguPPO.sh | 1 year ago | |
trlx_pangu_rlhf.py | 1 year ago |
trlx
库使用RLHF训练Pangu 2.6B中文对话模型pipeline我们的pipeline是基于OpenAI论文 "Learning to Summarize from human feedback"的开源代码进行修改。
1). 需要配置trlx库相关环境,参考 "[trlx] (https://github.com/CarperAI/trlx)"
git clone https://github.com/CarperAI/trlx.git
cd trlx
pip install torch --extra-index-url https://download.pytorch.org/whl/cu116 # for cuda
pip install -e .
2). 下载盘古-2.6B模型:
https://huggingface.co/imone/pangu_2_6B
3). 准备SFT数据集(以webtext为例):
https://paperswithcode.com/dataset/webtext
保存至: ./dialogue_dir/demo.json
4). 收集人工反馈数据
保存至: ./reward_data_dir/processed/demo.json
1). 监督微调 (SFT):
cd sft/ && deepspeed train_SFT.py
2). 训练 Reward 模型:
cd reward_model/ && deepspeed train_reward_model.py
3). 使用PPO算法强化学习:
accelerate launch --config_file configs/default_accelerate_config.yaml trlx_pangu_rlhf.py
备注: 至少需要1张V100显卡。
基于人工反馈增强盘古2.6B模型
Python Text Shell
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》