#11 GPU方式训练保存的8份checkpoint如何合并然后进行模型推理

Closed
created 1 year ago by abulice · 1 comments
abulice commented 1 year ago
按照GPU训练的方法用8卡机器finetune了一下2.6B的模型,发现保持的ckeckpoint文件是分成8块的,推理的时候用什么脚本可以把8块文件合并成1个呢?
abulice changed title from GPU方式训练保持的8份checkpoint如何合并然后进行模型推理 to GPU方式训练保存的8份checkpoint如何合并然后进行模型推理 1 year ago
taoht commented 1 year ago
Owner
1、分布式加载8片模型参数 ``` pangu_alpha = PanguAlphaModel(config) eval_net = EvalNet(pangu_alpha, pad_token=args_opt.padding_id) eval_net.set_train(False) model_predict = Model(eval_net) load_distributed_checkpoint(eval_net, ckpt_file_list, predict_layout) ``` 2、整合保存成单个模型文件 ``` save_checkpoint(pangu_alpha, "/path/to/mPanGu_integrated.ckpt", integrated_save=True) ```
abulice closed this issue 1 year ago
Sign in to join this conversation.
No Label
No Milestone
No Assignees
2 Participants
Notifications
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
There is no content yet.