#5120 【示例代码】智算GCU训练failed

Closed
created 3 months ago by wangj · 5 comments
wangj commented 3 months ago
智算GCU训练示例代码failed。日志显示不支持安装c2net sdk。 任务名:wjtes202401161801723 选择镜像 enflame-py3_train
wangj added this to the V20240116 milestone 3 months ago
wangj added the
bug
label 3 months ago
liuzx was assigned by wangj 3 months ago
liuzx added the
test
label 3 months ago
wangj removed the
test
label 3 months ago
wangj commented 3 months ago
Owner
经 @liuzx 定位,gcu镜像内是有python3环境的,但是它默认的python版本是2.7,导致安装不上c2net包。需要重做镜像
liuzx added the
test
label 3 months ago
liuzx commented 3 months ago
Collaborator
gcu镜像已更新
wangj commented 3 months ago
Owner
智算GCU训练已经能装上c2net包,但任务还是failed,报错信息:“raise RuntimeError('Attempting to deserialize object on a CUDA ' RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. ” 任务名 wjtes2024011916t115698532
wangj removed the
test
label 3 months ago
wangj commented 3 months ago
Owner
gcu示例代码,不选择模型,任务可以运行成功;选择模型,任务failed,报错同上
wangj commented 3 months ago
Owner
gcu示例代码,选择模型时勾选通过示例代码训练出来的模型,任务可以运行成功。
wangj closed this issue 3 months ago
wangj added the
test
label 3 months ago
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
There is no content yet.