Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
imyzx b6aec22a37 | 10 months ago | |
---|---|---|
docs | 1 year ago | |
examples | 1 year ago | |
images | 1 year ago | |
megatron | 1 year ago | |
tasks | 1 year ago | |
test | 1 year ago | |
tools | 1 year ago | |
3-minus-inference-en.md | 1 year ago | |
3-minus-inference.md | 1 year ago | |
LICENSE | 1 year ago | |
MANIFEST.in | 1 year ago | |
README.md | 10 months ago | |
README_mgt.md | 1 year ago | |
data_process.sh | 1 year ago | |
gpt2-merges.txt | 1 year ago | |
mergeMpCkpt.py | 1 year ago | |
pretrain_bert.py | 1 year ago | |
pretrain_gpt2.py | 1 year ago | |
pretrain_ict.py | 1 year ago | |
requirements.txt | 1 year ago | |
setup.py | 1 year ago | |
split_model.sh | 1 year ago | |
testLayerNorm.py | 1 year ago |
同时,本工程能够兼容英伟达GPU和燧原GCU,用户可以使用同一套代码,在两种不同的硬件平台上实现盘古模型的训练加速。
访问此链接下载wudao数据集
模型文件 | 大小 |
---|---|
wudao_corpus_20GB.tar | 9.8GB |
用户在解压数据集之后需要进行数据处理
python tools/preprocess_data_pangu.py \
--input=/path/to/wudapo_corpus_20GB/allZh_1Mfile/*.json \
--output-prefix /path/to/save/path/ \
--vocab-file ./megatron/tokenizer/bpe_4w_pcl/vocab \
--dataset-impl mmap \
--append-eod
bash examples/enflame/pretrain_pangu_distributed_2.6B_enflame.sh
bash examples/gpu/pretrain_pangu_distributed_2.6B.sh
为了兼容燧原GCU和英伟达GPU两种硬件平台,本工程进行了一定的修改。具体修改如下:
在megatron
中引入了torch_gcu
def is_torch_gcu_available():
if importlib.util.find_spec("torch_gcu") is None:
return False
if importlib.util.find_spec("torch_gcu.core") is None:
return False
return importlib.util.find_spec("torch_gcu.core.model") is not None
if is_torch_gcu_available():
import torch_gcu
torch_gcu.set_scalar_cached_enable(False)
else:
import torch as torch_gcu
GCU-device计算设备指定
if is_torch_gcu_available():
device = torch_gcu.gcu_device(args.local_rank * int(os.getenv("LEO_CLUSTER_NUM", '1')))
else:
device = torch.device("cpu")
优化器适配接口
if not is_torch_gcu_available():
optimizer.step()
else:
torch_gcu.optimizer_step(optimizer, [loss], mode=torch_gcu.JitRunMode.SAFEASYNC, model=model)
在涉及分布式的代码中引入了torch_gcu.distributed
参数配置修改:
export ENFLAME_LOG_LEVEL=FATAL
export COMPILE_OPTIONS_MLIR_DBG="-print-ir-before= -print-ir-after="
echo "127.0.0.1 "`python -c "import socket;print(socket.gethostname())"` >>/etc/hosts
echo "127.0.0.1 "`python -c "import socket;print(socket.getfqdn(socket.gethostname()))"` >>/etc/hosts
bash finetune_pangu_distributed_32card_2.6B_enflame master_ip node_rank
--master_addr=${master_addr} \ # 主节点ip
--master_port=${master_port} \ # 主节点端口号
--node_rank=$2 \ # 主节点0, 其他1、2、3,无顺序要求
--nnodes=4 \ # 4个节点
vim /etc/fstab
mater_ip:/path/to/dataset /path/to/dataset nfs defaults 0 0
本项目以鹏程·盘古 + GCU + PyTorch + Megatron张量并行 N卡训练为例,整体介绍鹏程·盘古如何在GCU训练
Text Python C++ Markdown Cuda other
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》