Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
huang2007 cf560d70ab | 1 year ago | |
---|---|---|
README.md | 1 year ago | |
model.py | 1 year ago | |
run.py | 1 year ago | |
run.sh | 1 year ago | |
train_for_c2net.py | 1 year ago |
启智集群单数据集的训练,启智集群多数据集的训练,智算集群的单数据集训练,这3个的训练使用方式不同,请注意区分。数据加载方式、模型定义逻辑大致同手写数字识别GPU版本_PytorchExample项目:
智算集群中单/多数据集使用方式:
本项目以#LeNet5-MNIST-PyTorch为例,简要介绍如何在启智AI协同平台上使用GCU集群+Pytorch完成训练任务,旨在为AI开发者提供启智训练示例。
训练任务:
启动文件: run.py
数据集:MnistDataset_Pytorch_GCU.zip,选择数据集->公开数据集->搜索“gcu”->选择MnistDataset_Pytorch_GCU->勾选MnistDataset_torch_GCU_GPU.zip
镜像:gcu_ubuntu_18.04_2.1.52_pytorch_train
资源规格:GCU:1*ENFLAME-T20, CPU:8, 内存:30G
调试任务:
su root
cd /dataset && unzip MnistDataset_torch_GCU_GPU.zip
cd /code && unzip master.zip
cd /code/gcu_pytorch_mnist && bash run.sh
GCU初始化
def is_torch_dtu_available():
if importlib.util.find_spec("torch_dtu") is None:
return False
if importlib.util.find_spec("torch_dtu.core") is None:
return False
return importlib.util.find_spec("torch_dtu.core.dtu_model") is not None
if is_torch_dtu_available():
import torch_dtu
import torch_dtu.distributed as dist
import torch_dtu.core.dtu_model as dm
from torch_dtu.nn.parallel import DistributedDataParallel as torchDDP
device计算设备指定
if is_torch_dtu_available():
device = dm.dtu_device()
else:
device = torch.device("cpu")
优化器更新接口
sgd = SGD(model.parameters(), lr=1e-1)
for _epoch in range(epoch):
loss.backward()
if is_torch_dtu_available():
dm.optimizer_step(sgd, barrier=True)
else:
sgd.step()
python3 run.py
目前训练任务的日志在代码中print输出,参考示例train_for_c2net.py代码相关print
静态编译原因,需您稍等2~3min
ls /pretrainmodel/
本项目以#LeNet5-MNIST-PyTorch为例,简要介绍如何在启智AI协同平台上使用GCU单卡/多卡 + Pytorch1.10.0完成训练任务,旨在为AI开发者提供启智训练示例。
Python Shell
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》