Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
Xin Yan ce42c3af28 | 1 year ago | |
---|---|---|
docs | 1 year ago | |
modelscope | 1 year ago | |
paddlepaddle | 1 year ago | |
测试样本汇总 | 1 year ago | |
LICENSE | 1 year ago | |
README.md | 1 year ago | |
requirements.txt | 1 year ago |
本项目基于ModelScope(魔搭)社区和飞桨AIStudio社区, 依托平台巨大的用户规模, 通过开源模型推理或者API接入的方式, 探索为开发者提供针对LLM的测评体验. 其能够对于某个prompt基于不同的模型生成多个结果, 开发者能基于生成结果比较模型效果.
目前仍是雏形, 还在系统规划当中, 致力为中文大模型社区提供一个机制尽可能公开透明、标准尽可能全面准确、结果尽可能真实权威的大模型评价机制.
模型名称 | 参数 | 研究单位 | 开源/API | 是否商用 | ModelScope | AIStudio | 效果 |
---|---|---|---|---|---|---|---|
ChatYuan-large-v2 | 0.7B | 元语智能 | 开源 | 否 | √ | √ | |
ChatGLM-6B | 6B | 智谱·AI | 开源 | 否 | √ | √ | |
Vicuna-7B | 7B | lm-sys | 开源 | 否 | todo | × | |
rwkv-4-raven | 14B | BlinkDL | 开源 | 是 | todo | × | |
MiniMax | 未知 | Minimax | 否 | 否 | × | × |
目前针对中文领域的大模型仍缺乏结果权威的、公开透明的评价体系. 本项目致力于针对此领域进行探索, 希望和开源社区一道, 从第三方的中立视角, 采用社区公认且公开透明的评测机制, 尽可能客观、准确地评价中文领域的大模型.
目前此项目仍是雏形, 评价维度、系统建设、人类打分、算力支持等各个方面均需要开源社区的力量一同参与. 希望感兴趣的同学能够为此项目以Issue、PR等形式提供灵感、方案、代码贡献!
No Description
Python Text
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》