#775 监控接口返回燧原、昇腾、寒武纪、天数等其他芯片的gpu使用率、显存使用率

Closed
created 1 year ago by linfj · 4 comments
linfj commented 1 year ago
可新增字段accCardUtil、accCardMemUsage
wakinzhang was assigned by linfj 1 year ago
wakinzhang added the
测试中
label 1 year ago
wakinzhang added the
开发完成
label 1 year ago
yangxzh1 commented 1 year ago
Collaborator
天数的训练任务,监控接口返回的accCardUtil、accCardMemUsage一直为-1 ![9878066f8dc888a28b6066524979ba7](/attachments/e0bc7d7a-34b2-4caa-b249-554ca4ba064a) 在训练任务的详情页,可以看到卡利用率和卡内存使用率 ![e09a7454b9268eb15b4351b5fadcb6a](/attachments/4cc6553a-7c0e-4f02-bf08-dc50ab9e81d1)
yangxzh1 commented 1 year ago
Collaborator
天数的资源监控数据已经正常了
linfj added this to the v4.3.4 milestone 1 year ago
yangxzh1 added the
测试通过
label 1 year ago
yangxzh1 removed the
开发完成
label 1 year ago
yangxzh1 removed the
测试中
label 1 year ago
yangxzh1 commented 1 year ago
Collaborator
记录一下测试数据: 1、npu ![image](/attachments/a0456b43-2e7e-44d6-981c-de204511aac4) 2、gcu 命令:efsmi -dmon ![image](/attachments/b1729668-87c5-41d3-951c-5f32d78ebc95) 3、mlu 命令:cnmon -r -t 1000 ![image](/attachments/85fbf939-c82e-4c13-a9c5-10ff919e38e2)
yangxzh1 commented 1 year ago
Collaborator
gpu的测试数据: 任务资源监控页面,英伟达gpu的显存使用率数据跟驱动看到的数据不一样,驱动命令出来的数据,计算4402/12288=36% ![image](/attachments/a9f27bc7-5ba6-41fd-847d-ef00a787dbfd) ![image](/attachments/e4760c07-07f0-47e7-9ef3-f6203543498b) 看起来监控页面的数据跟下面这个数据是对的上的。 ![fa10145bd26cfb58646e1b3789744f9](/attachments/f2f8d185-73e1-4f65-80d9-968ceac2d567)
yangxzh1 closed this issue 11 months ago
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
There is no content yet.