关于GCU、沐曦GPGPU、MLU、0卡V100资源4月7日恢复上架的公告>>> 关于共建具身智能开源数据集的倡议>>> 关于云脑任务中统一路径访问方式的公告>>> 关于将启智集群GPU资源迁移至智算集群的公告>>>

New Pull Request

#895 update autograd.Function and distribute api

Merged

zoulq merged 37 commits from frelam/MSAdapter:master0319 into master 2 weeks ago

zoulq reviewed 1 month ago

mindtorch/torch/autograd/function.py

												
			@@ -29,3 +58,3 @@
		
			        super(Function, self).__init__()
		
			        self.ctx = FunctionCtx()
		
			    def apply(self, *args, **kwargs):

zoulq commented 1 month ago

原来使用这个接口会报错提示用mindspore对应接口，现在不会提示但会在brop入参的地方报错，用户应该是看不懂的，所有在用户资料里要更新一下自定义算子章节，另外FAQ加个样例说明。

frelam commented 3 weeks ago

用动态生成函数的方式，在__init__阶段自动生成了bprop，当前用法可以与pytorch相同了。

zoulq commented 2 weeks ago

cell的这个功能后面会优化

zoulq reviewed 1 month ago

mindtorch/torch/distributed/distributed_c10d.py

												
			@@ -37,0 +70,4 @@
		
			    return input_ms, _origin_dtype
		
			# should use before cast_to_adapter_tensor
		
			def _recorver_dtype_on_ascend(output_ms, _origin_dtype):

zoulq commented 1 month ago

什么场景下会用到这两个类型转换接口？

frelam commented 1 month ago

Ascend上，用户输入tensor的dtype，在mindspore通信算子侧不支持时，会用到该类型转换。

zoulq reviewed 1 month ago

mindtorch/torch/distributed/distributed_c10d.py

zoulq reviewed 1 month ago

mindtorch/torch/optim/optimizer.py

zoulq reviewed 1 month ago

mindtorch/torch/optim/optimizer.py

zoulq reviewed 1 month ago

mindtorch/torch/autograd/function.py

frelam reviewed 1 month ago

testing/ut/pytorch/cuda/test_stream.py

frelam reviewed 1 month ago

mindtorch/torch/autograd/function.py

frelam changed title from ~~[WIP]update some api~~ to update some api 3 weeks ago

frelam changed title from ~~update some api~~ to update autograd.Function and distribute api 3 weeks ago

Erpim reviewed 2 weeks ago

mindtorch/torch/autograd/function.py

												
			@@ -5,0 +25,4 @@
		
			        self.dirty_tensors = args
		
			    def mark_non_differentiable(self, *args):
		
			        self.non_differentiable = args

Erpim commented 2 weeks ago

这些功能实际不生效？

frelam commented 2 weeks ago

是的。根据影响添加了warning或者报错。

zoulq referenced this issue from a commit 2 weeks ago

update autograd.Function and distribute api (#895) add some warnings for useless function update testcase update testcase update testcase update add distributed testcase update reduce_scatter_tensor testcase update testcase update distributed convert dtype on hccl revert somecode add autograd function testcase update autograd testcase update autograd function comment backward testcase revert some update fix pylint update autograd function add compile for autograd function update gloo Merge remote-tracking branch 'upstream/master' into master0319 Revert "update stream testcase to use less stream for memory saving" This reverts commit 371d1ce02ee888ca96126c11960e48a1e8004df2. update function fix pylint Merge remote-tracking branch 'upstream/master' into master0319 update autograd function update init_process_group update autograd function update autograd function update autograd function revert optmizer Merge remote-tracking branch 'upstream/master' into master0319 update stream testcase to use less stream for memory saving change distributed api async_op option from error to warning update some api update distributed op cast input dtype on Ascend add reduce scatter base add autograd function Co-authored-by: lvhaoyu <lvhaoyu@huawei.com> Reviewed-on: https://openi.pcl.ac.cn/OpenI/MSAdapter/pulls/895

zoulq merged commit 64f60e8971 into master 2 weeks ago

No reviewers

No Label

No Milestone

No Assignees

3 Participants

Notifications

Due Date

No due date set.

Dependencies

This pull request currently doesn't have any dependencies.

@@ -492,2 +556,3 @@
         _bc_op = _get_cache_prim(ms.ops.Broadcast)(src, _group_name)
     tensor.data = _bc_op((tensor,))[0]
     if get_backend(group) == "hccl":
         cast_tensor, _origin_dtype = _check_and_convert_dtype_on_ascend(tensor)

@@ -255,6 +255,13 @@ class _Optimizer:
         loss = None
         if closure is not None:
             loss = closure()
         if grads is None:

@@ -258,0 +260,4 @@
             for param_group in self.param_groups:
                 for param in param_group['param']:
                     _grad = param.grad if param.grad is not None else ms.ops.zeros_like(param)
                     grads.append(_grad)

@@ -42,0 +77,4 @@
     def backward(ctx, *grad_outputs):
         pass
     def bprop(self, *args, **kwargs):

@@ -8,6 +8,8 @@ import mindspore as ms
 import numpy as np
 from mindspore import jit, grad
 user_stream1 = ms_torch.cuda.Stream()

@@ -36,0 +80,4 @@
                                   "your custom autograd.Function to use it with backward "
                                   "mode AD.")
     def bprop(self, *args, **kwargs):