Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
liux 3bb0f63899 | 6 months ago | |
---|---|---|
algorithms | 6 months ago | |
imgs | 6 months ago | |
label_studio | 6 months ago | |
.DS_Store | 6 months ago | |
README.md | 6 months ago |
基于人类反馈的模型优化是实现大模型对齐人类价值的有效手段。现阶段,此类方法尚存在高质量的人类反馈数据获取代价高、奖励模型存在过优化或易被攻击等问题。因此,目前在该技术研究方向上可能还存在从以下挑战问题供研究者们探索研究:
针对这些挑战问题,本项目在算法、数据构建等方面对已经开展的相关工作进行开源,为研究者们在基于人类反馈的模型优化方法研究工作方面提供研究参考。具体的,开源内容包括以下内容:
如果你对本项目的使用和代码有任何问题,可以提交issue。同时你也可以通过邮箱 xuchx@pcl.ac.cn 直接联系我们
鹏城实验室,哈尔滨工业大学,国防科技大学。
基于人类反馈的模型调优方法开源问题:Open Issues in the LLM Fine-tuning based on Human Feedback
Go CSV Python Shell Protocol Buffer other
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》