Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
zhouyifeng d6efbea92f | 4 months ago | |
---|---|---|
assets | 4 months ago | |
README.md | 4 months ago | |
convert2mindrecord_no_batch.py | 4 months ago | |
data.py | 4 months ago | |
eval_mindrecord2.py | 4 months ago | |
lr_scheduler.py | 4 months ago | |
network.py | 4 months ago | |
train.py | 4 months ago |
划分的20%的验证集上自验精度如下:
mae:0.000254
r2_score:0.998732
0.5mae+0.5(1-r2_score):0.000761
划分的80%的训练集上自验精度如下:
mae:0.000253
r2_score:0.998763
0.5mae+0.5(1-r2_score):0.000745
验证集和训练集上的精度基本一致
环境说明:
Mindspore 2.0.0 GPU版本,cuda11.1,RTX 3090显卡,win11+wsl2 docker环境下训练(其它mindspore版本和GPU环境应该也行,昇腾环境也可以,上述是我比赛时使用的环境,但不同的环境,精度可能有轻微差异)
训练和评估说明:
生成的mindrecord文件如下截图:
注:此创建过程中,将原始csv数据的20%作为了验证集,不参与训练,只做评估,80%为训练集;且label是以原始数据的1000倍存储在了mindrecord中,即上述截图中的output_data_scale参数
创建完毕mindrecord_8_12_12_1000_no_batch_2336347文件夹,执行python train.py启动训练,在我的环境上约需要5小时左右训练完毕(启智的昇腾910环境也可以训练,但经过测试,我的这个模型在启智910上的训练速度只有我的3090显卡的三分之一,需要约15小时),且由于IO要求较高,必须从SSD读取mindrecord数据,如果使用HDD,经过我的测试,训练速度将下降20倍
训练完毕后获取到output文件夹,里面内容如下图,注意:每个ckpt文件名中mse后面的数值是每一个epoch训练完之后在验证集上执行mse loss得到的数值,且由于我的label是扩大了1000倍的,mse的计算公式又带有平方,所以这个数值实际是真实mse的100万倍
选择上述ckpt文件中,mse数值最小的一个,执行python eval_mindrecord2.py进行在验证集上评估,且由于前面使用的是扩大了1000倍的标签进行训练的,所以这里在计算评估指标时,需要把标签和模型的输出都缩小1000倍:
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》