关于GCU、沐曦GPGPU、MLU、0卡V100资源4月7日恢复上架的公告>>> 关于共建具身智能开源数据集的倡议>>> 关于云脑任务中统一路径访问方式的公告>>> 关于将启智集群GPU资源迁移至智算集群的公告>>>

History

EmmaHAN 062fe1a7c9 refactor(optimizer): change "filter_bias_and_bn" to "weight_decay_filter" (#752 ) * Replace filter_bias_and_bn by weight_decay_filter to fix and be compatible with previous bugs. * add get no_weight_decay layer form model when filter layers from weight decay in optim_factory * support layer_decay in optim_factory		3 months ago
..
README.md	refactor: uniform all model names (#701)	9 months ago

swinv2_tiny_window8_ascend.yaml	refactor(optimizer): change "filter_bias_and_bn" to "weight_decay_filter" (#752)	3 months ago

README.md

Swin Transformer V2

Swin Transformer V2

Swin Transformer V2: Scaling Up Capacity and Resolution

Introduction

This paper aims to explore large-scale models in computer vision. The authors tackle three major issues in training and
application of large vision models, including training instability, resolution gaps between pre-training and
fine-tuning, and hunger on labelled data. Three main techniques are proposed: 1) a residual-post-norm method combined
with cosine attention to improve training stability; 2) A log-spaced continuous position bias method to effectively
transfer models pre-trained using low-resolution images to downstream tasks with high-resolution inputs; 3) A
self-supervised pre-training method, SimMIM, to reduce the needs of vast labeled images. This model set new performance
records on 4 representative vision tasks, including ImageNet-V2 image classification, COCO object detection, ADE20K
semantic segmentation, and Kinetics-400 video action classification.[1]

Figure 1. Architecture of Swin Transformer V2 [1]

Results

Our reproduced model performance on ImageNet-1K is reported as follows.

Model	Context	Top-1 (%)	Top-5 (%)	Params (M)	Recipe	Download
swinv2_tiny_window8	D910x8-G	81.42	95.43	28.78	yaml	weights

Notes

Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.

Quick Start

Preparation

Installation

Please refer to the installation instruction in MindCV.

Dataset Preparation

Please download the ImageNet-1K dataset for model training and validation.

Training

Distributed Training

It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run

# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config configs/swintransformerv2/swinv2_tiny_window8_ascend.yaml --data_dir /path/to/imagenet

If the script is executed by the root user, the --allow-run-as-root parameter must be added to mpirun.

Similarly, you can train the model on multiple GPU devices with the above mpirun command.

For detailed illustration of all hyper-parameters, please refer to config.py.

Note: As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.

Standalone Training

If you want to train or finetune the model on a smaller dataset without distributed training, please run:

# standalone training on a CPU/GPU/Ascend device
python train.py --config configs/swintransformerv2/swinv2_tiny_window8_ascend.yaml --data_dir /path/to/dataset --distribute False

Validation

To validate the accuracy of the trained model, you can use validate.py and parse the checkpoint path with --ckpt_path.

python validate.py -c configs/swintransformerv2/swinv2_tiny_window8_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt

Deployment

Please refer to the deployment tutorial in MindCV.

References

[1] Liu Z, Hu H, Lin Y, et al. Swin transformer v2: Scaling up capacity and resolution[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 12009-12019.