History

EmmaHAN 062fe1a7c9 refactor(optimizer): change "filter_bias_and_bn" to "weight_decay_filter" (#752 ) * Replace filter_bias_and_bn by weight_decay_filter to fix and be compatible with previous bugs. * add get no_weight_decay layer form model when filter layers from weight decay in optim_factory * support layer_decay in optim_factory		3 months ago
..
README.md	feat: add training configs and training weights of Volo(d2,d3,d4) (#731)	7 months ago

volo_d1_ascend.yaml	refactor(optimizer): change "filter_bias_and_bn" to "weight_decay_filter" (#752)	3 months ago

volo_d2_ascend.yaml	refactor(optimizer): change "filter_bias_and_bn" to "weight_decay_filter" (#752)	3 months ago

volo_d3_ascend.yaml	refactor(optimizer): change "filter_bias_and_bn" to "weight_decay_filter" (#752)	3 months ago

volo_d4_ascend.yaml	refactor(optimizer): change "filter_bias_and_bn" to "weight_decay_filter" (#752)	3 months ago

README.md

VOLO

VOLO

VOLO: Vision Outlooker for Visual Recognition

Introduction

Vision Outlooker (VOLO), a novel outlook attention, presents a simple and general architecture. Unlike self-attention that focuses on global dependency modeling at a coarse level, the outlook attention efficiently encodes finer-level features and contexts into tokens, which is shown to be critically beneficial to recognition performance but largely ignored by the self-attention. Five versions different from model scaling are introduced based on the proposed VOLO: VOLO-D1 with 27M parameters to VOLO-D5 with 296M. Experiments show that the best one, VOLO-D5, achieves 87.1% top-1 accuracy on ImageNet-1K classification, which is the first model exceeding 87% accuracy on this competitive benchmark, without using any extra training data.

Figure 1. Illustration of outlook attention. [1]

Results

Our reproduced model performance on ImageNet-1K is reported as follows.

Model	Context	Top-1 (%)	Top-5 (%)	Params (M)	Recipe	Weight
volo_d1	D910x8-G	82.59	95.99	27	yaml	weights
volo_d2	D910x8-G	82.95	96.13	59	yaml	weights
volo_d3	D910x8-G	83.38	96.28	87	yaml	weights
volo_d4	D910x8-G	82.5	95.86	193	yaml	weights

Notes

Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.

Quick Start

Preparation

Installation

Please refer to the installation instruction in MindCV.

Dataset Preparation

Please download the ImageNet-1K dataset for model training and validation.

Training

Distributed Training

It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run

# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config configs/volo/volo_d1_ascend.yaml --data_dir /path/to/imagenet

If the script is executed by the root user, the --allow-run-as-root parameter must be added to mpirun

Similarly, you can train the model on multiple GPU devices with the above mpirun command.

For detailed illustration of all hyper-parameters, please refer to config.py.

Note: As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.

Standalone Training

If you want to train or finetune the model on a smaller dataset without distributed training, please run:

# standalone training on a CPU/GPU/Ascend device
python train.py --config configs/volo/volo_d1_ascend.yaml --data_dir /path/to/dataset --distribute False

Validation

To validate the accuracy of the trained model, you can use validate.py and parse the checkpoint path with --ckpt_path.

python validate.py -c configs/volo/volo_d1_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt

Deployment

To deploy online inference services with the trained model efficiently, please refer to the deployment tutorial.

References

[1] Yuan L , Hou Q , Jiang Z , et al. VOLO: Vision Outlooker for Visual Recognition[J]. . arXiv preprint arXiv:2106.13112, 2021.

A toolbox of vision models and algorithms based on MindSpore

https://github.com/mindspore-lab/mindcv

Python Markdown other

285365963@qq.com 100194830+JunyuLiu1@users.noreply.github.com 74176172+geniuspatrick@users.noreply.github.com

74176172+GeniusPatrick@users.noreply.github.com

53842165+The-truthh@users.noreply.github.com 52945530+Songyuanwei@users.noreply.github.com jasondhuang@tencent.com 121591093+XixinYang@users.noreply.github.com zp5070@gmail.com 77485245+wcrzlh@users.noreply.github.com 83412649+spencerr221@users.noreply.github.com 2635104165@qq.com

huxiuyu1943@sina.com 97332102+XuanmaiXue@users.noreply.github.com

2441413514@qq.com 119582555+sy-liang123@users.noreply.github.com 48508716+Baogerock@users.noreply.github.com

xingxjtu@gmail.com

How to access data resources in code