首先从 Tsinghua Cloud 下载处理好的 C-Eval 数据集,解压到 evaluation
目录下。然后运行
cd evaluation
python evaluate_ceval.py
这个脚本会在C-Eval的验证集上进行预测并输出准确率。如果想要得到测试集上的结果可以将代码中的 ./CEval/val/**/*.jsonl
改为 ./CEval/test/**/*.jsonl
,并按照 C-Eval 规定的格式保存结果并在 官网 上提交。
汇报的结果使用的是内部的并行测试框架,结果可能会有轻微波动。
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》