Deleting a branch is permanent. It CANNOT be undone. Continue?
Deleting a branch is permanent. It CANNOT be undone. Continue?
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》
问题描述
网络中包含多个模块,每个模块都继承nn.Cell。训练过程中发现,所占用的内存会不断增大,10个数据,仅仅几个epoch就占用十几个g,最终被系统killed,为了回收内存,防止被系统killed,使用mindspore.ms_memory_recycle()去回收内存,结果运行的时候会报Segmentation fault (core dumped)。
(在本地调试时,运行完mindspore.ms_memory_recycle(),内存回收了,也不会报错,在启智ai平台上训练时,跑完这条命令,过两三秒就会报错)
相关环境(GPU/NPU)
GPU/CPU
相关集群(启智/智算)
智算
任务类型(调试/训练/推理)
调试
任务名
nocol202401202054280
日志说明或问题截图
net的construct中
报错
期望的解决方案或建议
mindspore.ms_memory_recycle()能发挥作用,回收内存,不报错
Steps to reproduce the issue / 重现步骤
4.cd okgr_last/pcdet/models/backbones_3d/Chamfer3D
5.python setup.py develop
7.python train.py
可以咨询下mindspore官方