panshaowu
  • Joined on Jul 06, 2023
Loading Heatmap…

panshaowu commented on pull request OpenI/MSAdapter#621

add ddp

已达成一致

8 months ago

panshaowu commented on pull request OpenI/MSAdapter#621

add ddp

因为MSAdapter需要适配GPU和Ascend,torch仅支持GPU所以只有is_nccl_available ,对应的Ascend接口是hccl,所以新开了一个API is_hccl_available。建议修改“Torch没有这个函数”措辞,说明上述原因。

8 months ago

panshaowu commented on pull request OpenI/MSAdapter#621

add ddp

scatter 暂时无法支持,培晨过几天会新增这个通信算子,过段时间补上这个功能。 gather 已添加 train 父类已有这个函数

9 months ago

panshaowu commented on pull request OpenI/MSAdapter#621

add ddp

已修改

9 months ago

panshaowu commented on pull request OpenI/MSAdapter#621

add ddp

已修改

9 months ago

panshaowu commented on pull request OpenI/MSAdapter#621

add ddp

已修改

9 months ago

panshaowu commented on pull request OpenI/MSAdapter#621

add ddp

已修改

9 months ago

panshaowu commented on pull request OpenI/MSAdapter#621

add ddp

有可能在DDP后面才设置pynative或graph模式。给个INFO日志好了。

9 months ago

panshaowu commented on pull request OpenI/MSAdapter#621

add ddp

不需要

9 months ago

panshaowu commented on pull request OpenI/MSAdapter#621

add ddp

torch只销毁进程组,MindSpore调用releadse就把整个分布式资源释放了。

9 months ago

panshaowu commented on pull request OpenI/MSAdapter#621

add ddp

已修改

9 months ago

panshaowu commented on pull request OpenI/MSAdapter#621

add ddp

理论上和torch的功能、调用逻辑一致,注释也就是把torch的文档搬过来。如果某些功能没有被覆盖,已统一在ConstraintList中说明。

9 months ago

panshaowu commented on pull request OpenI/MSAdapter#621

add ddp

后续计划在进程组内部初始化通信算子类,其他模块只需要构造进程组类并调用方法,而不要每次调用通信算子时都传进程组。当前用到的通信算子较少,还没有用这些方法。

9 months ago