关于GCU、沐曦GPGPU、MLU、0卡V100资源4月7日恢复上架的公告>>> 关于共建具身智能开源数据集的倡议>>> 关于云脑任务中统一路径访问方式的公告>>> 关于将启智集群GPU资源迁移至智算集群的公告>>>

New Pull Request

#419 nn.MultiheadAttention

Merged

zoulq merged 10 commits from lzh_multihead into master 1 year ago

lzh changed title from ~~[WIP]nn.MultiheadAttention~~ to nn.MultiheadAttention 1 year ago

zoulq reviewed 1 year ago

msadapter/pytorch/nn/modules/activation.py

zoulq reviewed 1 year ago

msadapter/pytorch/nn/modules/activation.py

zoulq reviewed 1 year ago

testing/ut/pytorch/nn/test_activation.py

lzh reviewed 1 year ago

testing/ut/pytorch/nn/test_activation.py

												
			@@ -510,8 +511,9 @@ def test_hardsigmoid():
		
			    assert ms_out.asnumpy().dtype == torch_out.numpy().dtype
		
			#TODO: multiheadattention need reconstruct

lzh commented 1 year ago

补充了test/nn/test_multihead_attention.py里的用例

zoulq reviewed 1 year ago

msadapter/pytorch/nn/modules/activation.py

zoulq reviewed 1 year ago

msadapter/pytorch/nn/modules/activation.py

supportedList里已有

image.png

20 KiB

zoulq referenced this issue from a commit 1 year ago

nn.MultiheadAttention (#419) fix review issues fix pylint issues set all add_key_padding_mask to False and add comment delete casting for Parameter-type input and other changes for testing replace with ms funcs add testcases fix review issues set mode to pynative in testcases and minor changes minor changes draft Co-authored-by: liuzehui2018 <liuzehui2009@gmail.com> Reviewed-on: https://openi.pcl.ac.cn/OpenI/MSAdapter/pulls/419

zoulq merged commit aeae9881d5 into master 1 year ago

jiayu_neu referenced this pull request from jiayu_neu/MSAdapter 1 year ago

nn.MultiheadAttention (#419) #41

lzh deleted branch lzh_multihead 1 year ago

No reviewers

No Label

No Milestone

No Assignees

2 Participants

Notifications

Due Date

No due date set.

Dependencies

This pull request currently doesn't have any dependencies.

@@ -452,2 +424,2 @@
         if attn_mask:
             _attn_mask = self._process_mask(attn_mask, _batch_size)
         self.head_dim = embed_dim // num_heads
         assert self.head_dim * num_heads == self.embed_dim, "embed_dim must be divisible by num_heads"

@@ -454,0 +425,4 @@
         assert self.head_dim * num_heads == self.embed_dim, "embed_dim must be divisible by num_heads"
         if self._qkv_same_embed_dim is False:
             self.q_proj_weight = Parameter(ms.ops.zeros((embed_dim, embed_dim), dtype=dtype))

@@ -514,0 +441,4 @@
             self.in_proj_bias = Parameter(empty(3 * embed_dim, dtype=dtype))
         else:
             self.in_proj_bias = None
         # TODO: NonDynamicallyQuantizableLinear

@@ -525,0 +521,4 @@
             attn_output = attn_output.swapaxes(1, 0)
         if need_weights:
             return attn_output, attn_output_weights
         return (attn_output,)