Contents
NTS-Net for Navigator-Teacher-Scrutinizer Network, consists of a Navigator agent, a Teacher agent and a Scrutinizer agent. In consideration of intrinsic consistency between informativeness of the regions and their probability being ground-truth class, NTS-Net designs a novel training paradigm, which enables Navigator to detect most informative regions under the guidance from Teacher. After that, the Scrutinizer scrutinizes the proposed regions from Navigator and makes predictions
Paper: Z. Yang, T. Luo, D. Wang, Z. Hu, J. Gao, and L. Wang, Learning to navigate for fine-grained classification, in Proceedings of the European Conference on Computer Vision (ECCV), 2018.
NTS-Net consists of a Navigator agent, a Teacher agent and a Scrutinizer agent. The Navigator navigates the model to focus on the most informative regions: for each region in the image, Navigator predicts how informative the region is, and the predictions are used to propose the most informative regions. The Teacher evaluates the regions proposed by Navigator and provides feedbacks: for each proposed region, the Teacher evaluates its probability belonging to ground-truth class; the confidence evaluations guide the Navigator to propose more informative regions with a novel ordering-consistent loss function. The Scrutinizer scrutinizes proposed regions from Navigator and makes fine-grained classifications: each proposed region is enlarged to the same size and the Scrutinizer extracts features therein; the features of regions and of the whole image are jointly processed to make fine-grained classifications.
Note that you can run the scripts based on the dataset mentioned in original paper or widely used in relevant domain/network architecture. In the following sections, we will introduce how to run the scripts using the related dataset below.
Dataset used: Caltech-UCSD Birds-200-2011
Please download the datasets [CUB_200_2011.tgz] and unzip it, then put all training images into a directory named "train", put all testing images into a directory named "test".
The directory structure is as follows:
├─resnet50.ckpt
└─cub_200_2011
├─train
└─test
- Hardware Ascend
- Prepare hardware environment with Ascend processor.
- Framework
- For more information, please check the resources below:
.
└─ntsnet
├─README.md # README
├─scripts # shell script
├─run_standalone_train.sh # training in standalone mode(1pcs)
├─run_distribute_train.sh # training in parallel mode(8 pcs)
└─run_eval.sh # evaluation
├─src
├─config.py # network configuration
├─dataset.py # dataset utils
├─lr_generator.py # leanring rate generator
├─network.py # network define for ntsnet
└─resnet.py # resnet.py
├─mindspore_hub_conf.py # mindspore hub interface
├─export.py # script to export MINDIR model
├─eval.py # evaluation scripts
└─train.py # training scripts
# distributed training
Usage: bash run_train.sh [RANK_TABLE_FILE] [DATA_URL] [TRAIN_URL]
# standalone training
Usage: bash run_standalone_train.sh [DATA_URL] [TRAIN_URL]
"img_width": 448, # width of the input images
"img_height": 448, # height of the input images
# anchor
"size": [48, 96, 192], #anchor base size
"scale": [1, 2 ** (1. / 3.), 2 ** (2. / 3.)], #anchor base scale
"aspect_ratio": [0.667, 1, 1.5], #anchor base aspect_ratio
"stride": [32, 64, 128], #anchor base stride
# resnet
"resnet_block": [3, 4, 6, 3], # block number in each layer
"resnet_in_channels": [64, 256, 512, 1024], # in channel size for each layer
"resnet_out_channels": [256, 512, 1024, 2048], # out channel size for each layer
# LR
"base_lr": 0.001, # base learning rate
"base_step": 58633, # bsae step in lr generator
"total_epoch": 200, # total epoch in lr generator
"warmup_step": 4, # warmp up step in lr generator
"sgd_momentum": 0.9, # momentum in optimizer
# train
"batch_size": 8, # 16 for gpu
"weight_decay": 1e-4,
"epoch_size": 200, # total epoch size
"save_checkpoint": True, # whether save checkpoint or not
"save_checkpoint_epochs": 1, # save checkpoint interval
"num_classes": 200,
"lr_scheduler": "cosine", # lr_scheduler, support cosine or step
"optimizer": "momentum"
- Set options in
config.py
, including learning rate, output filename and network hyperparameters. Click here for more information about dataset.
- Get ResNet50 pretrained model from Mindspore Hub
- Run
run_standalone_train_ascend.sh
for non-distributed training of NTS-Net model in Ascend.
# standalone training in ascend
bash run_standalone_train_ascend.sh [DATA_URL] [TRAIN_URL] [DEVICE_ID(optional)]
- Run
run_standalone_train_gpu.sh
for non-distributed training of NTS-Net model in GPU.
# standalone training in gpu
bash run_standalone_train_gpu.sh [DATA_URL] [TRAIN_URL] [DEVICE_ID(optional)]
- Run
run_distribute_train_ascend.sh
for distributed training of NTS-Net model in Ascend.
bash run_distribute_train_ascend.sh [RANK_TABLE_FILE] [DATA_URL] [TRAIN_URL]
- Run
run_distribute_train_gpu.sh
for distributed training of NTS-Net model in GPU.
bash run_distribute_train_gpu.sh [DEVICE_NUM] [VISIABLE_DEVICES(0,1,2,3,4,5,6,7)] [DATA_URL] [TRAIN_URL]
- hccl.json which is specified by RANK_TABLE_FILE is needed when you are running a distribute task. You can generate it by using the hccl_tools.
- As for PRETRAINED_MODEL, it should be a trained ResNet50 checkpoint, name the pretraied weight to resnet50.ckpt and put it in dataset directory. See [Training Process](#Training Process)
Training result will be stored in train_url path. You can find checkpoint file together with result like the following in loss.log.
# distribute training result(8p)
epoch: 1 step: 750 ,loss: 30.88018
epoch: 2 step: 750 ,loss: 26.73352
epoch: 3 step: 750 ,loss: 22.76208
epoch: 4 step: 750 ,loss: 20.52259
epoch: 5 step: 750 ,loss: 19.34843
epoch: 6 step: 750 ,loss: 17.74093
- Run
run_eval_ascend.sh
for evaluation.
# infer on Ascend
sh run_eval_ascend.sh [DATA_URL] [TRAIN_URL] [CKPT_FILENAME] [DEVICE_ID(optional)]
- Run
run_eval_gpu.sh
for evaluation.
# infer on GPU
sh run_eval_gpu.sh [DATA_URL] [TRAIN_URL] [CKPT_FILENAME] [DEVICE_ID(optional)]
Inference result will be stored in the train_url path. Under this, you can find result like the following in eval.log.
ckpt file name: ntsnet-112_750.ckpt
accuracy: 0.876
Model Export
python export.py --ckpt_file [CKPT_PATH] --device_target [DEVICE_TARGET] --file_format [EXPORT_FORMAT]
EXPORT_FORMAT
should be "MINDIR"
Model Description
Performance
Evaluation Performance
Parameters |
Ascend |
Telsa V100-PCIE |
Model Version |
V1 |
V1 |
Resource |
Ascend 910; CPU 2.60GHz, 192cores; Memory, 755G |
TeslaA100; CPU 2.3GHz, 40cores; Memory 377G |
uploaded Date |
16/04/2021 (day/month/year) |
05/10/2021 (day/month/year) |
MindSpore Version |
1.1.1 |
1.5.0rc1 |
Dataset |
cub200-2011 |
cub200-2011 |
Training Parameters |
epoch=200, batch_size = 8 |
epoch=200, batch_size = 16 |
Optimizer |
SGD |
Momentum |
Loss Function |
Softmax Cross Entropy |
Softmax Cross Entropy |
Output |
predict class |
predict class |
Loss |
10.9852 |
12.195317 |
Speed |
1pc: 130 ms/step; 8pcs: 138 ms/step |
1pc: 480 ms/step; |
Total time |
8pcs: 5.93 hours |
|
Parameters |
87.6 |
87.5 |
Checkpoint for Fine tuning |
333.07M(.ckpt file) |
222.03(.ckpt file) |
Scripts |
ntsnet script |
ntsnet script |
We use random seed in train.py and eval.py for weight initialization.
Please check the official homepage.
FAQ
First refer to ModelZoo FAQ to find some common public questions.
- Q: What to do if memory overflow occurs when using PYNATIVE_MODE? A:Memory overflow is usually because PYNATIVE_MODE requires more memory. Setting the batch size to 2 reduces memory consumption and can be used for network training.