History

EmmaHAN 062fe1a7c9 refactor(optimizer): change "filter_bias_and_bn" to "weight_decay_filter" (#752 ) * Replace filter_bias_and_bn by weight_decay_filter to fix and be compatible with previous bugs. * add get no_weight_decay layer form model when filter layers from weight decay in optim_factory * support layer_decay in optim_factory		3 months ago
..
README.md	refactor: uniform all model names (#701)	10 months ago

skresnet18_ascend.yaml	refactor(optimizer): change "filter_bias_and_bn" to "weight_decay_filter" (#752)	3 months ago

skresnet34_ascend.yaml	refactor(optimizer): change "filter_bias_and_bn" to "weight_decay_filter" (#752)	3 months ago

skresnext50_32x4d_ascend.yaml	refactor(optimizer): change "filter_bias_and_bn" to "weight_decay_filter" (#752)	3 months ago

README.md

SKNet

SKNet

Selective Kernel Networks

Introduction

The local receptive fields (RFs) of neurons in the primary visual cortex (V1) of cats [1] have inspired the
construction of Convolutional Neural Networks (CNNs) [2] in the last century, and it continues to inspire mordern CNN
structure construction. For instance, it is well-known that in the visual cortex, the RF sizes of neurons in the
same area (e.g.,V1 region) are different, which enables the neurons to collect multi-scale spatial information in the
same processing stage. This mechanism has been widely adopted in recent Convolutional Neural Networks (CNNs).
A typical example is InceptionNets [3, 4, 5, 6], in which a simple concatenation is designed to aggregate
multi-scale information from, e.g., 3×3, 5×5, 7×7 convolutional kernels inside the “inception” building block.

Figure 1. Selective Kernel Convolution.

Results

Our reproduced model performance on ImageNet-1K is reported as follows.

Model	Context	Top-1 (%)	Top-5 (%)	Params (M)	Recipe	Download
skresnet18	D910x8-G	73.09	91.20	11.97	yaml	weights
skresnet34	D910x8-G	76.71	93.10	22.31	yaml	weights
skresnext50_32x4d	D910x8-G	79.08	94.60	37.31	yaml	weights

Notes

Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
Top-1 and Top-5: Accuracy reported on the validation set of ImageNet-1K.

Quick Start

Preparation

Installation

Please refer to the installation instruction in MindCV.

Dataset Preparation

Please download the ImageNet-1K dataset for model training and validation.

Training

Distributed Training

It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run

# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config configs/sknet/skresnext50_32x4d_ascend.yaml --data_dir /path/to/imagenet

Similarly, you can train the model on multiple GPU devices with the above mpirun command.

For detailed illustration of all hyper-parameters, please refer to config.py.

Note: As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.

Standalone Training

If you want to train or finetune the model on a smaller dataset without distributed training, please run:

# standalone training on a CPU/GPU/Ascend device
python train.py --config configs/sknet/skresnext50_32x4d_ascend.yaml --data_dir /path/to/dataset --distribute False

Validation

To validate the accuracy of the trained model, you can use validate.py and parse the checkpoint path with --ckpt_path.

python validate.py -c configs/sknet/skresnext50_32x4d_ascend.yaml --data_dir /path/to/imagenet --ckpt_path /path/to/ckpt

Deployment

To deploy online inference services with the trained model efficiently, please refer to the deployment tutorial.

References

[1] D. H. Hubel and T. N. Wiesel. Receptive fields, binocular interaction and functional architecture in the cat’s visual
cortex. The Journal of Physiology, 1962.

[2] Y . LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation
applied to handwritten zip code recognition. Neural Computation, 1989.

[3] C. Szegedy, V . V anhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. In
CVPR, 2016.

[4] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift.
arXiv preprint arXiv:1502.03167, 2015.

[5] C. Szegedy, V . V anhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. In
CVPR, 2016.

[6] C. Szegedy, S. Ioffe, V . V anhoucke, and A. A. Alemi. Inception-v4, inception-resnet and the impact of residual
connections on learning. In AAAI, 2017.