#968 AvgPool2d:Kernel launch failed

Open
created 4 months ago by davislee · 2 comments
```python import mindtorch.torch as torch import mindtorch.torch.nn as nn import mindspore as ms ms.set_context(device_id=2, device_target="Ascend",ascend_config={"precision_mode":"allow_fp32_to_fp16"}) input_tensor = torch.randn(1, 3, 5, 5) down_sample=nn.AvgPool2d(3, stride=2, padding=[1, 1], count_include_pad=False) down_sample(input_tensor) print("success") ``` 报错 ``` [ERROR] KERNEL(1963854,ffff95f7a160,python):2024-09-13-11:40:41.488.104 [mindspore/ccsrc/plugin/device/ascend/kernel/acl/acl_kernel_mod.cc:260] Launch] Kernel launch failed, msg: Acl compile and execute failed, op_type_:AvgPool3D ---------------------------------------------------- - Ascend Error Message: ---------------------------------------------------- E40021: 2024-09-13-11:40:41.468.304 Failed to compile Op [AvgPool3D1]. (oppath: [Compile /home/shuziren/Ascend/ascend-toolkit/8.0.RC2/opp/built-in/op_impl/ai_core/tbe/impl/dynamic/avg_pool3d.py failed with errormsg/stack: File "/home/shuziren/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/impl/util/util_cube_dynamic.py", line 821, in check_dynamic_range_lower if tensor.get("range"): AttributeError: 'NoneType' object has no attribute 'get' ], optype: [AvgPool3D])[THREAD:1964364] Solution: See the host log for details, and then check the Python stack where the error log is reported. TraceBack (most recent call last): Compile op[AvgPool3D1] failed, oppath[/home/shuziren/Ascend/ascend-toolkit/8.0.RC2/opp/built-in/op_impl/ai_core/tbe/impl/dynamic/avg_pool3d.py], optype[AvgPool3D], taskID[8]. Please check op's compilation error message.[FUNC:ReportBuildErrMessage][FILE:fusion_manager.cc][LINE:748][THREAD:1964364] [SubGraphOpt][Compile][ProcFailedCompTask] Thread[281464633549152] recompile single op[AvgPool3D1] failed[FUNC:ProcessAllFailedCompileTasks][FILE:tbe_op_store_adapter.cc][LINE:961][THREAD:1964364] [SubGraphOpt][Compile][ParalCompOp] Thread[281464633549152] process fail task failed[FUNC:ParallelCompileOp][FILE:tbe_op_store_adapter.cc][LINE:1009][THREAD:1964364] [SubGraphOpt][Compile][CompOpOnly] CompileOp failed.[FUNC:CompileOpOnly][FILE:op_compiler.cc][LINE:1112][THREAD:1964364] [GraphOpt][FusedGraph][RunCompile] Failed to compile graph with compiler Normal mode Op Compiler[FUNC:SubGraphCompile][FILE:fe_graph_optimizer.cc][LINE:1420][THREAD:1964364] Call OptimizeFusedGraph failed, ret:-1, engine_name:AIcoreEngine, graph_name:partition0_rank1_new_sub_graph1[FUNC:OptimizeSubGraph][FILE:graph_optimize.cc][LINE:119][THREAD:1964364] subgraph 0 optimize failed[FUNC:OptimizeSubGraphWithMultiThreads][FILE:graph_manager.cc][LINE:1012][THREAD:1964238] build graph failed, graph id:0, ret:-1[FUNC:BuildModelWithGraphId][FILE:ge_generator.cc][LINE:1608][THREAD:1964238] [Build][SingleOpModel]call ge interface generator.BuildSingleOpModel failed. ge result = 4294967295[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161][THREAD:1964238] [Build][Op]Fail to build op model[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145][THREAD:1964238] build op model failed, result = 500002[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145][THREAD:1964238] (Please search "CANN Common Error Analysis" at https://www.mindspore.cn for error code description) ---------------------------------------------------- - C++ Call Stack: (For framework developers) ---------------------------------------------------- mindspore/ccsrc/transform/acl_ir/acl_utils.cc:379 Run [ERROR] DEVICE(1963854,ffff95f7a160,python):2024-09-13-11:40:41.488.434 [mindspore/ccsrc/plugin/device/ascend/hal/hardware/ge_kernel_executor.cc:1156] LaunchKernel] Launch kernel failed, kernel full name: Default/AvgPool3D-op0 Traceback (most recent call last): File "/workspace/IPLAP/bug5.py", line 9, in <module> down_sample(input_tensor) File "/home/shuziren/.conda/envs/iplap/lib/python3.10/site-packages/mindspore/nn/cell.py", line 721, in __call__ raise err File "/home/shuziren/.conda/envs/iplap/lib/python3.10/site-packages/mindspore/nn/cell.py", line 718, in __call__ _pynative_executor.end_graph(self, output, *args, **kwargs) File "/home/shuziren/.conda/envs/iplap/lib/python3.10/site-packages/mindspore/common/api.py", line 1464, in end_graph self._executor.end_graph(obj, output, *args, *(kwargs.values())) RuntimeError: Launch kernel failed, name:Default/AvgPool3D-op0 ---------------------------------------------------- - C++ Call Stack: (For framework developers) ---------------------------------------------------- mindspore/ccsrc/runtime/pynative/op_runner.cc:624 LaunchKernels ERROR conda.cli.main_run:execute(125): `conda run python /workspace/IPLAP/bug5.py` failed. (See above for error) ``` 环境: * mindtorch 最新的master分支 * mindspore 2.3.1 * Cann:Ascend-cann-kernels-910_8.0.RC2_linux、Ascend-cann-toolkit_8.0.RC2_linux-aarch64 * NPU:910B * Python: 3.10.14 * gcc:7.3.0
Erpim commented 4 months ago
Collaborator
Cann底层报错,已将问题反馈至相关同事定位。
Mr.Xiao commented 2 months ago
遇到了同样的问题 ``` [ERROR] KERNEL(14,fffdf87680e0,python):2024-11-12-02:43:10.450.631 [mindspore/ccsrc/plugin/device/ascend/kernel/acl/acl_kernel_mod.cc:261] Launch] Kernel launch failed, msg: Acl compile and execute failed, op_type_:AvgPool3D ---------------------------------------------------- - Ascend Error Message: ---------------------------------------------------- E40021: 2024-11-12-02:43:10.435.142 Failed to compile Op [AvgPool3D45]. (oppath: [Compile /usr/local/Ascend/ascend-toolkit/8.0.RC1/opp/built-in/op_impl/ai_core/tbe/impl/dynamic/avg_pool3d.py failed with errormsg/stack: File "/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/impl/util/util_cube_dynamic.py", line 821, in check_dynamic_range_lower if tensor.get("range"): AttributeError: 'NoneType' object has no attribute 'get' ], optype: [AvgPool3D])[THREAD:3101] Solution: See the host log for details, and then check the Python stack where the error log is reported. TraceBack (most recent call last): Compile op[AvgPool3D45] failed, oppath[/usr/local/Ascend/ascend-toolkit/8.0.RC1/opp/built-in/op_impl/ai_core/tbe/impl/dynamic/avg_pool3d.py], optype[AvgPool3D], taskID[243]. Please check op's compilation error message.[FUNC:ReportBuildErrMessage][FILE:fusion_manager.cc][LINE:748][THREAD:3101] [SubGraphOpt][Compile][ProcFailedCompTask] Thread[281472062623968] recompile single op[AvgPool3D45] failed[FUNC:ProcessAllFailedCompileTasks][FILE:tbe_op_store_adapter.cc][LINE:962][THREAD:3101] [SubGraphOpt][Compile][ParalCompOp] Thread[281472062623968] process fail task failed[FUNC:ParallelCompileOp][FILE:tbe_op_store_adapter.cc][LINE:1010][THREAD:3101] [SubGraphOpt][Compile][CompOpOnly] CompileOp failed.[FUNC:CompileOpOnly][FILE:op_compiler.cc][LINE:1119][THREAD:3101] [GraphOpt][FusedGraph][RunCompile] Failed to compile graph with compiler Normal mode Op Compiler[FUNC:SubGraphCompile][FILE:fe_graph_optimizer.cc][LINE:1385][THREAD:3101] Call OptimizeFusedGraph failed, ret:-1, engine_name:AIcoreEngine, graph_name:partition0_rank1_new_sub_graph1[FUNC:OptimizeSubGraph][FILE:graph_optimize.cc][LINE:126][THREAD:3101] subgraph 0 optimize failed[FUNC:OptimizeSubGraphWithMultiThreads][FILE:graph_manager.cc][LINE:1021][THREAD:1189] build graph failed, graph id:44, ret:-1[FUNC:BuildModelWithGraphId][FILE:ge_generator.cc][LINE:1615][THREAD:1189] [Build][SingleOpModel]call ge interface generator.BuildSingleOpModel failed. ge result = 4294967295[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161][THREAD:1189] [Build][Op]Fail to build op model[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145][THREAD:1189] build op model failed, result = 500002[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145][THREAD:1189] (Please search "CANN Common Error Analysis" at https://www.mindspore.cn for error code description) ---------------------------------------------------- - C++ Call Stack: (For framework developers) ---------------------------------------------------- mindspore/ccsrc/transform/acl_ir/acl_utils.cc:379 Run [ERROR] DEVICE(14,fffdf87680e0,python):2024-11-12-02:43:10.450.687 [mindspore/ccsrc/plugin/device/ascend/hal/hardware/ge_kernel_executor.cc:1007] LaunchKernel] Launch kernel failed, kernel full name: Default/AvgPool3D-op0 Traceback (most recent call last): File "/tmp/code/testnerv/train_nerv.py", line 636, in <module> main() File "/tmp/code/testnerv/train_nerv.py", line 208, in main train(None, args) File "/tmp/code/testnerv/train_nerv.py", line 431, in train msssim_list.append(msssim_fn(output_list, target_list)) File "/tmp/code/testnerv/utils.py", line 192, in msssim_fn msssim = ms_ssim(output.float().detach(), File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindtorch-0.3.0-py3.9.egg/mindtorch/torch/tensor.py", line 693, in detach output = ms.ops.stop_gradient(input_ms) File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/ops/function/grad/grad_func.py", line 1398, in stop_gradient return P.StopGradient()(value) File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/ops/primitive.py", line 392, in __call__ return _run_op(self, self.name, args) File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/ops/primitive.py", line 1009, in _run_op stub = _pynative_executor.run_op_async(obj, op_name, args) File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/common/api.py", line 1234, in run_op_async return self._executor.run_op_async(*args) RuntimeError: Launch kernel failed, name:Default/AvgPool3D-op0 ---------------------------------------------------- - C++ Call Stack: (For framework developers) ---------------------------------------------------- mindspore/ccsrc/runtime/pynative/op_runner.cc:632 LaunchKernels failed ```
Sign in to join this conversation.
No Label
No Milestone
No Assignees
3 Participants
Notifications
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
There is no content yet.