Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
deng 86ab235749 | 1 year ago | |
---|---|---|
scripts | 2 years ago | |
src | 2 years ago | |
README.md | 1 year ago | |
eval.py | 2 years ago | |
export.py | 2 years ago | |
train.py | 2 years ago |
szagoruyko在ResNet基础上提出WideResNet,用于解决网络模型瘦长时,只有有限层学到了有用的知识,更多的层对最终结果只做出了很少的贡献。这个问题也被称为diminishing feature reuse,WideResNet作者加宽了残差块,将训练速度提升几倍,精度也有明显改善。
如下为MindSpore使用cifar10数据集对WideResNet进行训练的示例。
WideResNet的总体网络架构如下:链接
使用的数据集:cifar10
└─train
├─data_batch_1.bin # 训练数据集
├─data_batch_2.bin # 训练数据集
├─data_batch_3.bin # 训练数据集
├─data_batch_4.bin # 训练数据集
├─data_batch_5.bin # 训练数据集
└─test_batch.bin # 评估数据集
└─eval
└─test_batch.bin # 评估数据集
通过官方网站安装MindSpore后,您可以按照如下步骤进行训练和评估:
# 分布式训练
用法:
cd scripts
bash run_distribute_train.sh [RANK_TABLE_FILE] [DATASET_PATH] [PRETRAINED_CKPT_PATH] [MODELART]
# 单机训练
用法:
cd scripts
bash run_standalone_train.sh [DATASET_PATH] [PRETRAINED_CKPT_PATH] [MODELART]
# 运行评估示例
用法:
cd scripts
bash run_eval.sh [DATASET_PATH] [CHECKPOINT_PATH] [MODELART]
若没有[PRETRAINED_CKPT_PATH],使用 “” 作为参数运行脚本。
└──wideresnet
├── README.md
├── scripts
├── run_distribute_train.sh # 启动Ascend分布式训练(8卡)
├── run_eval.sh # 启动Ascend评估
└── run_standalone_train.sh # 启动Ascend单机训练(单卡)
├── src
├── config.py # 参数配置
├── dataset.py # 数据预处理
├── cross_entropy_smooth.py # cifar10数据集的损失定义
├── generator_lr.py # 生成每个步骤的学习率
├── save_callback.py # 自定义回调函数保存最优ckpt
└── wide_resnet.py # WideResNet网络结构
├── eval.py # 评估网络
├── export.py # 导出网络
└── train.py # 训练网络
在config.py中可以同时配置训练参数和评估参数。
"num_classes":10, # 数据集类数
"batch_size":32, # 输入张量的批次大小
"epoch_size":300, # 训练周期大小
"save_checkpoint_path":"./", # 检查点相对执行路劲的保存路径
"repeat_num":1, # 数据集重复次数
"widen_factor":10, # 网络宽度
"depth":40, # 网络深度
"lr_init":0.1, # 初始学习率
"weight_decay":5e-4, # 权重衰减
"momentum":0.9, # 动量优化器
"loss_scale":32, # 损失等级
"save_checkpoint":True, # 是否保存检查点
"save_checkpoint_epochs":5, # 两个检查点之间的周期间隔;默认情况下,最后一个检查点将在最后一个周期完成后保存
"keep_checkpoint_max":10, # 只保存最后一个keep_checkpoint_max检查点
"use_label_smooth":True, # 标签平滑
"label_smooth_factor":0.1, # 标签平滑因子
"pretrain_epoch_size":0, # 预训练周期
"warmup_epochs":5, # 热身周期
# 分布式训练
用法:
cd scripts
bash run_distribute_train.sh [RANK_TABLE_FILE] [DATASET_PATH] [PRETRAINED_CKPT_PATH] [MODELART]
# 单机训练
用法:
cd scripts
bash run_standalone_train.sh [DATASET_PATH] [PRETRAINED_CKPT_PATH] [MODELART]
若没有[PRETRAINED_CKPT_PATH],使用 “” 作为参数运行脚本。
分布式训练需要提前创建JSON格式的HCCL配置文件。
具体操作,参见hccn_tools中的说明。
训练结果保存在示例路径中,文件夹名称以“train”或“train_parallel”开头。您可在此路径下的日志中找到检查点文件以及结果,如下所示。
# 分布式训练结果(8P)
epoch: 2 step: 195, loss is 1.4352043
epoch: 2 step: 195, loss is 1.4611206
epoch: 2 step: 195, loss is 1.2635705
epoch: 2 step: 195, loss is 1.3457444
epoch: 2 step: 195, loss is 1.4664338
epoch: 2 step: 195, loss is 1.3559061
epoch: 2 step: 195, loss is 1.5225968
epoch: 2 step: 195, loss is 1.246567
epoch: 3 step: 195, loss is 1.0763402
epoch: 3 step: 195, loss is 1.3007892
epoch: 3 step: 195, loss is 1.2473519
epoch: 3 step: 195, loss is 1.3249974
epoch: 3 step: 195, loss is 1.3388557
epoch: 3 step: 195, loss is 1.2402486
epoch: 3 step: 195, loss is 1.2878766
epoch: 3 step: 195, loss is 1.1507874
epoch: 4 step: 195, loss is 1.014946
epoch: 4 step: 195, loss is 1.1934564
epoch: 4 step: 195, loss is 0.9506259
epoch: 4 step: 195, loss is 1.2101849
epoch: 4 step: 195, loss is 1.0160742
epoch: 4 step: 195, loss is 1.2643425
epoch: 4 step: 195, loss is 1.3422029
epoch: 4 step: 195, loss is 1.221174
...
# 评估
用法:
cd scripts
bash run_eval.sh [DATASET_PATH] [CHECKPOINT_PATH] [MODELART]
# 评估示例
用法:
cd scripts
bash run_eval.sh /cifar10 WideResNet_best.ckpt
训练过程中可以生成检查点。
评估结果保存在示例路径中,文件夹名为“eval”。您可在此路径下的日志找到如下结果:
result: {'top_1_accuracy': 0.9622395833333334}
参数 | Ascend 910 |
---|---|
模型版本 | WideResNet |
资源 | Ascend 910;CPU:2.60GHz,192核;内存:755G |
上传日期 | 2021-05-20 ; |
MindSpore版本 | 1.2.1 |
数据集 | cifar10 |
训练参数 | epoch=300, steps per epoch=195, batch_size = 32 |
优化器 | Momentum |
损失函数 | Softmax交叉熵 |
输出 | 概率 |
损失 | 0.545541 |
速度 | 65.2毫秒/步(8卡) |
总时长 | 70分钟 |
参数(M) | 52.1 |
微调检查点 | 426.49M(.ckpt文件) |
脚本 | 链接 |
dataset.py中设置了“create_dataset”函数内的种子,同时还使用了train.py中的随机种子。
请浏览官网主页。
优先参考ModelZoo FAQ来查找一些常见的公共问题。
szagoruyko在ResNet基础上提出WideResNet,用于解决网络模型瘦长时,只有有限层学到了有用的知识,更多的层对最终结果只做出了很少的贡献。这个问题也被称为diminishing feature reuse,WideResNet作者加宽了残差块,将训练速度提升几倍,精度也有明显改善。
Python Shell
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》