Erpim
  • Joined on Oct 07, 2022
  • Organization
Loading Heatmap…

Erpim created NPU type debugging task erpim202409141183953

2 days ago

Erpim created NPU type debugging task erpim202409141183953(deleted)

2 days ago

Erpim created NPU type debugging task erpim202409141183325

2 days ago

Erpim commented on issue OpenI/MSAdapter#946

RuntimeError: Allocate memory failed

MindTorch一般是不会导致内存泄漏的。MindSpore框架在中间开发版本确实有几个点会触发显存泄漏,在新版本已经修复了,当前只能通过现替换环境中的mindspore版本来验证。

1 month ago

Erpim pushed to erpim_0806 at OpenI/MSAdapter

1 month ago

Erpim pushed to erpim_0806 at OpenI/MSAdapter

1 month ago

Erpim pushed to erpim_0806 at OpenI/MSAdapter

1 month ago

Erpim pushed to erpim_0806 at OpenI/MSAdapter

1 month ago

Erpim commented on issue OpenI/MSAdapter#946

RuntimeError: Allocate memory failed

另一个思路,云脑环境是可以替换mindspore版本的,从https://www.mindspore.cn/versions 查看历史版本,安装2.3rc2或者2.3.0看下是否还复现问题

1 month ago

Erpim commented on issue OpenI/MSAdapter#947

RuntimeError:E40021和E61001

> 我在官网上找到的GPU最新版本是mindtorch0.2.1+mindspore2.2.14(我在AutoDL上调试的),而Ascend调试任务只有mindtorch0.3版本的镜像可用。所以版本不一致,我晚点提供一个简化版的代码。 配置ms.set_context(device_target="CPU")试试呢?先看下其他硬件条件下流程是否正常。

1 month ago

Erpim commented on issue OpenI/MSAdapter#947

RuntimeError:E40021和E61001

这里的调试任务和训练任务的差异点是什么?不是很理解这个描述,是说云脑镜像申请的模式吗?

1 month ago

Erpim created pull request OpenI/MSAdapter#949

fix module kwargs bug

1 month ago

Erpim closed pull request OpenI/MSAdapter#948

fix module kwargs bug

1 month ago

Erpim pushed to erpim_0806 at OpenI/MSAdapter

1 month ago

Erpim pushed to r0.4_dev at OpenI/MSAdapter

1 month ago

Erpim created pull request OpenI/MSAdapter#948

fix module kwargs bug

1 month ago

Erpim pushed to erpim_0806 at OpenI/MSAdapter

1 month ago

Erpim deleted branch erpim_0402 from OpenI/MSAdapter

1 month ago

Erpim commented on issue OpenI/MSAdapter#946

RuntimeError: Allocate memory failed

你好,这个报错信息应该是指的显存不足了,通过top命令是无法观察的。改变数据集大小或者bs执行的迭代数有所变化吗?另外,每次报错的迭代数是一致的还是随机的呢?可以把图模式遇到的报错也发一下。

1 month ago

Erpim commented on issue OpenI/MSAdapter#947

RuntimeError:E40021和E61001

你好,可以提供个简化的代码吗,从这个报错信息上看像是自动微分报的错,有可能是框架bug导致。一般如果期望GPU和NPU进行比较的时候,最好保持mindspore以及mindtorch的版本是一致的,可以快速排查是框架还是具体硬件算子的影响

1 month ago