zhangyh02 d8b5ce05b4 | 2 years ago | |
---|---|---|
inference | 3 years ago | |
README.md | 2 years ago | |
README_CN.md | 2 years ago |
In the inference phase, the PanGu α-13B&2.6B models are compressed from 8 NPUs to 1 NPU with only about 2% performance fluctuations. To achieve this, a variety of model compression techniques are applied and the MindSpore source codes are adapted. Please modify the policy file and model file paths of Pangu-α-13B/2.6B when using it. The implementation is based on the settings of PengCheng Cloud Brain II. For local implementation, please modify the file path and the codes related to file replication.
Quantization
By loading a model with low precision, most of the float32 parameters can be converted to float16 and the corresponding quantization noise is processed.
Parameter sharing
This model has been adapted to share the output layer parameters with the embedding layer parameters.
When the embedding size is 2560 and the vocabulary size is 40000, 40000 * 2560 parameters can be saved.
Mindspore source code adaptation
The model parallelism strategies during training and loading are inconsistent. Specifically, semi-automatic model parallelism is used during training and no model parallelism is used during loading.
In addition, the parameter types saved after training are also inconsistent with the model parameter types used during inference. Thus, the underlying support of MindSpore needs to be modified.
There are three main differences in the inference code.
pangu_dropout_recompute_eos_fp16.py
mindspore_ascend-1.1.0-cp37-cp37m-linux_aarch64.whl
from eval_task-13b-fp16 import get_model
model = get_model(args)
Model | Memory occupation | inference speed |
---|---|---|
PanGu-α-13B (Before compression) | 8NPU | ~150ms |
PanGu-α-13B (After compression) | 1NPU | ~250ms |
WebQA.v1.0 (em/f1) | CLUEWSC2020 (acc) | |
---|---|---|
zero shot | ||
PanGu-α-13B (Before compression) | 5.126/14.470 | 75.000 |
PanGu-α-13B (After compression) | 5.060/14.466 | 73.684 |
将盘古α-13B/2.6B由8卡压缩到1卡上推理。
Python Shell Markdown other
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》