Erpim created NPU type debugging task erpim202409141183953
2 days ago
Erpim created NPU type debugging task erpim202409141183953(deleted)
2 days ago
Erpim created NPU type debugging task erpim202409141183325
2 days ago
Erpim commented on issue OpenI/MSAdapter#946
RuntimeError: Allocate memory failedMindTorch一般是不会导致内存泄漏的。MindSpore框架在中间开发版本确实有几个点会触发显存泄漏,在新版本已经修复了,当前只能通过现替换环境中的mindspore版本来验证。
1 month ago
Erpim pushed to erpim_0806 at OpenI/MSAdapter
1 month ago
Erpim pushed to erpim_0806 at OpenI/MSAdapter
1 month ago
Erpim pushed to erpim_0806 at OpenI/MSAdapter
1 month ago
Erpim pushed to erpim_0806 at OpenI/MSAdapter
1 month ago
Erpim commented on issue OpenI/MSAdapter#946
RuntimeError: Allocate memory failed另一个思路,云脑环境是可以替换mindspore版本的,从https://www.mindspore.cn/versions 查看历史版本,安装2.3rc2或者2.3.0看下是否还复现问题
1 month ago
Erpim commented on issue OpenI/MSAdapter#947
RuntimeError:E40021和E61001> 我在官网上找到的GPU最新版本是mindtorch0.2.1+mindspore2.2.14(我在AutoDL上调试的),而Ascend调试任务只有mindtorch0.3版本的镜像可用。所以版本不一致,我晚点提供一个简化版的代码。 配置ms.set_context(device_target="CPU")试试呢?先看下其他硬件条件下流程是否正常。
1 month ago
Erpim commented on issue OpenI/MSAdapter#947
RuntimeError:E40021和E61001这里的调试任务和训练任务的差异点是什么?不是很理解这个描述,是说云脑镜像申请的模式吗?
1 month ago
Erpim pushed to erpim_0806 at OpenI/MSAdapter
1 month ago
Erpim deleted branch erpim_0402 from OpenI/MSAdapter
1 month ago
Erpim commented on issue OpenI/MSAdapter#946
RuntimeError: Allocate memory failed你好,这个报错信息应该是指的显存不足了,通过top命令是无法观察的。改变数据集大小或者bs执行的迭代数有所变化吗?另外,每次报错的迭代数是一致的还是随机的呢?可以把图模式遇到的报错也发一下。
1 month ago
Erpim commented on issue OpenI/MSAdapter#947
RuntimeError:E40021和E61001你好,可以提供个简化的代码吗,从这个报错信息上看像是自动微分报的错,有可能是框架bug导致。一般如果期望GPU和NPU进行比较的时候,最好保持mindspore以及mindtorch的版本是一致的,可以快速排查是框架还是具体硬件算子的影响
1 month ago
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》