Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
liuzx e76f9cd52d | 9 months ago | |
---|---|---|
README.md | 1 year ago | |
model.py | 1 year ago | |
pretrain_for_c2net.py | 9 months ago | |
train_for_c2net.py | 9 months ago |
启智集群单数据集的训练,启智集群多数据集的训练,智算集群的单数据集训练,这3个的训练使用方式不同,请注意区分。数据加载方式、模型定义逻辑大致同手写数字识别GPU版本_PytorchExample项目:
智算集群中单/多数据集使用方式:
如本示例中数据集MNISTDataset_torch.zip的使用方式是:数据集位于/tmp/dataset/下
使用GCU进行训练,使用的框架为Pytorch,上传和使用数据集的格式和GPU保存一致,可传到数据集-GPU界面。(此步骤在本示例中不需要,可直接选择公开数据集MNISTDataset_torch.zip)
GCU初始化
def is_torch_dtu_available():
if importlib.util.find_spec("torch_dtu") is None:
return False
if importlib.util.find_spec("torch_dtu.core") is None:
return False
return importlib.util.find_spec("torch_dtu.core.dtu_model") is not None
if is_torch_dtu_available():
import torch_dtu
import torch_dtu.distributed as dist
import torch_dtu.core.dtu_model as dm
from torch_dtu.nn.parallel import DistributedDataParallel as torchDDP
device计算设备指定
if is_torch_dtu_available():
device = dm.dtu_device()
else:
device = torch.device("cpu")
优化器更新接口
sgd = SGD(model.parameters(), lr=1e-1)
for _epoch in range(epoch):
loss.backward()
if is_torch_dtu_available():
dm.optimizer_step(sgd, barrier=True)
else:
sgd.step()
准备好数据和执行脚本以后,需要创建训练任务将GCU-Pytorch脚本运行。首次使用的用户可参考本示例代码。
启动脚本选择train_for_c2net.py
目前训练任务的日志在代码中print输出,参考示例train_for_c2net.py代码相关print
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》