Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
enflame01 a754b1453d | 4 months ago | |
---|---|---|
doc | 4 months ago | |
README.md | 4 months ago | |
model.py | 4 months ago | |
train_for_c2net.py | 4 months ago |
- 派生项目指创建一个示例代码仓的副本到你的账户下
- 你的派生项目与原示例代码仓为两个独立项目,各自的改动不会互相影响,除非你向原项目提出代码变更申请
在本页面右上角选择 “派生” -> 按照提示填写内容 -> “派生项目”
- 训练任务:右上角选择 “云脑”->“训练任务”->“新建训练任务”(如下图)
- 调试任务:右上角选择 “云脑”->“调试任务”->“新建调试任务”
【基础信息】
- 算力集群:智算网络集群
- 计算资源:燧原GCU
- 访问Internet:是
- 资源规格:GCU:1*ENFLAME-T20, CPU:8, 内存:30G
- 任务名称:GCU-Mnist(可自定义)
- 任务描述(可选项):GCU pytorch lenet mnist(可自定义)
【参数配置】
- 代码分支:master
- 选择模型(可选项):none
- 镜像:enflame_py3_train
- 启动文件:train_for_c2net.py
- 数据集:MnistDataset_Pytorch_GCU.zip,选择“公开数据集”->“MNIST_PytorchExample_GPU”->勾选“MnistDataset_torch.zip”
- 运行参数(可选项):none
- 在配置页面选择“新建任务”,任务正常的状态变化为:WAITING->RUNNING->SUCCEEDED
- 打开训练任务示例,可以查看配置信息、任务运行简况、日志、资源占用情况、结果下载等信息(如下图)
在调试任务页面,选择状态“RUNNING”任务->“调试”->"Terminal",进入调试页面:
su root
cd /tmp/code && unzip master.zip
cd /tmp/dataset && unzip MnistDataset_torch.zip
cd /tmp/code/gcu_pytorch_mnist
python3 train_for_c2net.py
静态编译原因可能需要稍等1~2min,以下是部分训练日志:
dtu is available: True
epoch:10, batchsize:256
…………
the 10 epoch_size begin
idx: 0, loss: 0.07400538772344589
idx: 10, loss: 0.06701870262622833
idx: 20, loss: 0.045509178191423416
idx: 30, loss: 0.03666653111577034
idx: 40, loss: 0.0712934136390686
idx: 50, loss: 0.05111538991332054
idx: 60, loss: 0.02955653704702854
idx: 70, loss: 0.04876096546649933
idx: 80, loss: 0.05822735279798508
idx: 90, loss: 0.02273925580084324
idx: 100, loss: 0.05779978632926941
idx: 110, loss: 0.03197643533349037
idx: 120, loss: 0.046534184366464615
idx: 130, loss: 0.030895164236426353
idx: 140, loss: 0.026784008368849754
idx: 150, loss: 0.04921684414148331
idx: 160, loss: 0.049655407667160034
idx: 170, loss: 0.05504291504621506
idx: 180, loss: 0.08287880569696426
idx: 190, loss: 0.00771808996796608
idx: 200, loss: 0.07163269817829132
idx: 210, loss: 0.02763640508055687
idx: 220, loss: 0.033682629466056824
idx: 230, loss: 0.0015385933220386505
accuracy: 0.98
root@q74c3b5eecfa479fb3d3616b59a80cdd-task0-0:/tmp/code/gcu_pytorch_mnist#
def is_torch_gcu_available():
if importlib.util.find_spec("torch_gcu") is None:
return False
return True
if is_torch_gcu_available():
import torch_gcu
# log filter
os.environ['ENFLAME_LOG_LEVEL']='FATAL'
os.environ['ENFLAME_LOG_DEBUG_MOD']=''
os.environ['ENFLAME_ENABLE_EFP']='true'
if is_torch_gcu_available():
device = torch_gcu.gcu_device(0)
else:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
optimizer = SGD(model.parameters(), lr=1e-1)
for _epoch in range(epoch):
loss.backward()
if is_torch_gcu_available():
torch_gcu.optimizer_step(optimizer)
else:
optimizer.step()
本项目以启智+GCU+Pytorch+LeNet5_MNIST为例,重点介绍如何创建GCU训练/调试任务、GCU代码适配。
Python
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》