ViT:全名Vision Transformer,不同于传统的基于CNN的网络结果,是基于Transformer结构的CV网络,2021年谷歌研究发表网络,在大数据集上表现了非常强的泛化能力。大数据任务(如:CLIP)基于该结构能有良好的效果。MindFormers提供的ViT权重及精度均是是基于MAE预训练ImageNet-1K数据集进行微调得到。
论文: Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby. 2021.
config | task | Datasets | metric | score | train performance | prediction performance |
---|---|---|---|---|---|---|
vit_base_p16 | image_classification | ImageNet-1K | Top1-Accuracy | 0.8372 | 262.31 samples/s/p | 363.50 (fps) |
ViT
基于 MindFormers
实现,主要涉及的文件有:
模型具体实现:mindformers/models/vit
model
├── __init__.py
├── convert_weight.py # 权重转换脚本
├── vit.py # 模型实现
├── vit_config.py # 模型配置项
├── vit_modules.py # 模型所需模块
└── vit_processor.py # Model预处理
模型配置:configs/vit
model
└── run_vit_base_p16_224_100ep.yaml # vit_base模型启动配置
运行mindformers/tools/hccl_tools.py生成RANK_TABLE_FILE的json文件
# 运行如下命令,生成当前机器的RANK_TABLE_FILE的json文件
python ./mindformers/tools/hccl_tools.py --device_num "[0,8)"
注:若使用ModelArts的notebook环境,可从 /user/config/jobstart_hccl.json
路径下直接获取rank table,无需手动生成
RANK_TABLE_FILE 单机8卡参考样例:
{
"version": "1.0",
"server_count": "1",
"server_list": [
{
"server_id": "xx.xx.xx.xx",
"device": [
{"device_id": "0","device_ip": "192.1.27.6","rank_id": "0"},
{"device_id": "1","device_ip": "192.2.27.6","rank_id": "1"},
{"device_id": "2","device_ip": "192.3.27.6","rank_id": "2"},
{"device_id": "3","device_ip": "192.4.27.6","rank_id": "3"},
{"device_id": "4","device_ip": "192.1.27.7","rank_id": "4"},
{"device_id": "5","device_ip": "192.2.27.7","rank_id": "5"},
{"device_id": "6","device_ip": "192.3.27.7","rank_id": "6"},
{"device_id": "7","device_ip": "192.4.27.7","rank_id": "7"}],
"host_nic_ip": "reserve"
}
],
"status": "completed"
}
RANK_TABLE_FILE
文件,然后将不同机器上生成的RANK_TABLE_FILE
文件全部拷贝到同一台机器上。# 运行如下命令,生成当前机器的RANK_TABLE_FILE的json文件
python ./mindformers/tools/hccl_tools.py --device_num "[0,8)" --server_ip xx.xx.xx.xx
注:需要根据机器的ip地址指定 --server_ip,避免由于不同机器server_ip不同,导致多节点间通信失败。
RANK_TABLE_FILE
文件合并# 运行如下命令,合并每个机器上的RANK_TABLE_FILE的json文件。
python ./mindformers/tools/merge_hccl.py hccl*.json
RANK_TABLE_FILE
文件拷贝到所有机器中,保证不同机器上的RANK_TABLE_FILE
相同。RANK_TABLE_FILE 双机16卡参考样例:
{
"version": "1.0",
"server_count": "2",
"server_list": [
{
"server_id": "xx.xx.xx.xx",
"device": [
{
"device_id": "0", "device_ip": "192.168.0.0", "rank_id": "0"
},
{
"device_id": "1", "device_ip": "192.168.1.0", "rank_id": "1"
},
{
"device_id": "2", "device_ip": "192.168.2.0", "rank_id": "2"
},
{
"device_id": "3", "device_ip": "192.168.3.0", "rank_id": "3"
},
{
"device_id": "4", "device_ip": "192.168.0.1", "rank_id": "4"
},
{
"device_id": "5", "device_ip": "192.168.1.1", "rank_id": "5"
},
{
"device_id": "6", "device_ip": "192.168.2.1", "rank_id": "6"
},
{
"device_id": "7", "device_ip": "192.168.3.1", "rank_id": "7"
}
],
"host_nic_ip": "reserve"
},
{
"server_id": "xx.xx.xx.xx",
"device": [
{
"device_id": "0", "device_ip": "192.168.0.1", "rank_id": "8"
},
{
"device_id": "1", "device_ip": "192.168.1.1", "rank_id": "9"
},
{
"device_id": "2", "device_ip": "192.168.2.1", "rank_id": "10"
},
{
"device_id": "3", "device_ip": "192.168.3.1", "rank_id": "11"
},
{
"device_id": "4", "device_ip": "192.168.0.2", "rank_id": "12"
},
{
"device_id": "5", "device_ip": "192.168.1.2", "rank_id": "13"
},
{
"device_id": "6", "device_ip": "192.168.2.2", "rank_id": "14"
},
{
"device_id": "7", "device_ip": "192.168.3.2", "rank_id": "15"
}
],
"host_nic_ip": "reserve"
}
],
"status": "completed"
}
如果无需加载权重,或者使用from_pretrained功能自动下载,可以跳过此章节。
MindFormers提供高级接口from_pretrained功能直接下载MindFormerBook中的vit_base_p16.ckpt,无需手动转换。
本仓库中的vit_base_p16
来自于facebookresearch/mae的ViT-Base, 如需手动下载权重,可参考以下示例进行转换:
从上述链接中下载ViT-Base
的模型权重
执行转换脚本,得到转换后的输出文件vit_base_p16.ckpt
python mindformers/models/vit/convert_weight.py --torch_path "PATH OF ViT-Base.pth" --mindspore_path "SAVE PATH OF vit_base_p16.ckpt"
可以使用AutoClass接口,通过模型名称自动下载并加载权重
from_pretrained()
接口会自动从云上下载预训练的模型,存储路径:mindformers/checkpoint_download/vit
import mindspore
from mindformers import AutoModel, AutoConfig
from mindformers.tools.image_tools import load_image
from mindformers import ViTImageProcessor
# 指定图模式,指定使用训练卡id
mindspore.set_context(mode=0, device_id=0)
# 模型标志加载模型
model = AutoModel.from_pretrained("vit_base_p16")
#模型配置加载模型
config = AutoConfig.from_pretrained("vit_base_p16")
# {'patch_size': 16, 'in_chans': 3, 'embed_dim': 768, 'depth': 12, 'num_heads': 12, 'mlp_ratio': 4,
# ..., 'batch_size': 32, 'image_size': 224, 'num_classes': 1000}
model = AutoModel.from_config(config)
img = load_image("https://ascend-repo-modelzoo.obs.cn-east-2.myhuaweicloud.com/XFormer_for_mindspore/clip/sunflower.png")
image_processor = ViTImageProcessor(size=224)
processed_img = image_processor(img)
predict_result = model(processed_img)
print(predict_result)
# output
# (Tensor(shape=[1, 1000], dtype=Float32, value=
# [[-5.38996577e-01, -2.30418444e-02, 2.06433788e-01 ... -6.59191251e-01, 8.57466936e-01, 6.56416774e-01]]), None)
import mindspore
from mindformers.trainer import Trainer
from mindformers.tools.image_tools import load_image
# 指定图模式,指定使用训练卡id
mindspore.set_context(mode=0, device_id=0)
# 初始化任务
vit_trainer = Trainer(
task='image_classification',
model='vit_base_p16',
train_dataset="imageNet-1k/train",
eval_dataset="imageNet-1k/val")
img = load_image("https://ascend-repo-modelzoo.obs.cn-east-2.myhuaweicloud.com/XFormer_for_mindspore/clip/sunflower.png")
# 方式1:使用现有的预训练权重进行finetune, 并使用finetune获得的权重进行eval和推理
vit_trainer.train(resume_or_finetune_from_checkpoint="mae_vit_base_p16", do_finetune=True)
vit_trainer.evaluate(eval_checkpoint=True)
predict_result = vit_trainer.predict(predict_checkpoint=True, input_data=img, top_k=3)
print(predict_result)
# 方式2: 从新开始训练,并使用训练好的权重进行eval和推理
vit_trainer.train()
vit_trainer.evaluate(eval_checkpoint=True)
predict_result = vit_trainer.predict(predict_checkpoint=True, input_data=img, top_k=3)
print(predict_result)
# 方式3: 从obs下载训练好的权重并进行eval和推理
vit_trainer.evaluate()
predict_result = vit_trainer.predict(input_data=img, top_k=3)
print(predict_result)
# output
# - mindformers - INFO - output result is: [[{'score': 0.8880876, 'label': 'daisy'},
# {'score': 0.0049882396, 'label': 'bee'}, {'score': 0.0031068476, 'label': 'vase'}]]
import mindspore
from mindformers.pipeline import pipeline
from mindformers.tools.image_tools import load_image
# 指定图模式,指定使用训练卡id
mindspore.set_context(mode=0, device_id=0)
pipeline_task = pipeline("image_classification", model='vit_base_p16')
img = load_image("https://ascend-repo-modelzoo.obs.cn-east-2.myhuaweicloud.com/XFormer_for_mindspore/clip/sunflower.png")
pipeline_result = pipeline_task(img, top_k=3)
print(pipeline_result)
# output
# [[{'score': 0.8880876, 'label': 'daisy'}, {'score': 0.0049882396, 'label': 'bee'},
# {'score': 0.0031068476, 'label': 'vase'}]]
Trainer和pipeline接口默认支持的task和model关键入参
task(string) | model(string) |
---|---|
image_classification | vit_base_p16 |
使用的数据集:ImageNet2012
数据集目录格式
└─imageNet-1k
├─train # 训练数据集
└─val # 评估数据集
# pretrain
python run_mindformer.py --config ./configs/vit/run_vit_base_p16_224_100ep.yaml --run_mode train
多卡运行需要RANK_FILE_TABLE,请参考前期准备-生成RANK_TABLE_FILE
cd scripts
bash run_distribute.sh RANK_TABLE_FILE ../configs/vit/run_vit_base_p16_224_100ep.yaml [0,8] train 8
多机多卡运行需要合并不同机器的RANK_FILE_TABLE,参考前期准备-多机RANK_TABLE_FILE合并
在每台机器上启动bash run_distribute.sh
。
注:需要保证执行的节点和RANK_TABLE_FIEL的节点顺序保持一致,即rank_id匹配。
server_count=12
device_num=8*$server_count
# launch ranks in the 0th server
cd scripts
bash run_distribute.sh $RANK_TABLE_FILE ../configs/vit/run_vit_base_p16_224_100ep.yaml [0,8] train $device_num
# launch ranks in the 1-11 server via ssh
for idx in {1..11}
do
let rank_start=8*$idx
let rank_end=$rank_start+8
ssh ${IP_LIST[$idx]} "cd scripts; bash run_distribute.sh $RANK_TABLE_FILE ../configs/vit/run_vit_base_p16_224_100ep.yaml [$rank_start,$rank_end] train $device_num"
done
其中
RANK_TABLE_FILE
为上一步汇总并分发的总rank table文件;IP_LIST
为12台服务器的IP地址。如192.168.0.[0-11]IP_LIST=("192.168.0.0", "192.168.0.1", ..., "192.168.0.11")
# evaluate
python run_mindformer.py --config ./configs/vit/run_vit_base_p16_224_100ep.yaml --run_mode eval --eval_dataset_dir [DATASET_PATH]
# output
# ViT: Top1 Accuracy = {'Top1 Accuracy': 0.8371678937259923}
# predict
python run_mindformer.py --config ./configs/vit/run_vit_base_p16_224_100ep.yaml --run_mode predict --predict_data [PATH_TO_IMAGE]
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》