SwinTransformer是新型的视觉Transformer,它可以用作计算机视觉的通用backbone。在两个领域之间的差异,例如视觉实体尺度的巨大差异以及与文字中的单词相比,图像中像素的高分辨率,带来了使Transformer从语言适应视觉方面的挑战。
为了解决这些差异,作者提出了一个分层的Transformer,其表示是通过移动窗口来计算的。通过将自注意力计算限制为不重叠的局部窗口,同时允许跨窗口连接,移位的窗口方案带来了更高的效率。这种分层体系结构具有在各种尺度上建模的灵活性,并且相对于图像大小具有线性计算复杂性。
V2相对于V1的改进:
Post Normalization
Scaled Cosine Attention
对数连续位置编码技术
Clone 仓库
git clone https://openi.pcl.ac.cn/OpenModelZoo/mindcv_swintransformerv2
准备数据 (若已有数据集即可忽略)
# 准备图片索引路径的json文件,如下方目录中的train_ids.json
python swintransformerv2/datasets/tools/get_image_ids.py --image IMAGE_PATH_DIR --file JSON_FILE_NAME
imagenet-1k/
├───train/
| ├───n01440764/
| | ├───n01440764_10026.JPEG
| | ├───n01440764_10027.JPEG
| | ├───...
| ├───n01443537/
| ├───...
├───val/
| ├───n01440764/
| | ├───n01440764_10026.JPEG
| | ├───n01440764_10027.JPEG
| | ├───...
| ├───n01443537/
| ├───...
└───train_ids.json # 存储图片路径列表的json文件--> ["imagenet-1k/train/n01440764/n01440764_10026.JPEG"]
openi训练
# 单卡训练
python train.py --config=CONFIG_PATH --use_parallel=False --device_num=1
# 分布式微调
python train.py --config=CONFIG_PATH --use_parallel=True --device_num=16
openi评估
# 单卡评估
python eval.py --config=CONFIG_PATH --use_parallel=False --device_num=1
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》