21 KiB

Raw Permalink Blame History

OSNet for Ascend

OSNet Description
Model Architecture
Dataset
Features
- Mixed Precision
Environment Requirements
Script Description
Model Description
- Performance
  - Training Performance
  - Inference Performance
Description of Random Situation
ModelZoo Homepage

OSNet Description

OSNet is an efficient and accurate neural network architecture for person re-identification. The method proposed a novel CNN architecture designed for learning omni-scale feature representations. The feature representations are captured by multiple convolutional streams with different receptive field sizes and fused by channel-wise weights that are generated by a unified aggregation gate(AG). This idea was proposed in the paper "Omni-Scale Feature Learning for Person Re-Identification.", published in 2019.

Paper Kaiyang Zhou, Yongxin Yang, Andrea Cavallaro, Tao Xiang, University of Surrey, Queen Mary University of London Samsung AI Center, Cambridge, Published in IEEE 2019.

Model architecture

The network structure can be decomposed into two parts: feature extraction and feature merging. Use multiple convolutional streams with different receptive field sizes in the feature extraction layer to obtain feature map. In the feature merging part, the resulting multi-scale feature maps are dynamically fused by channel-wise weights that are generated by a unified aggregation gate(AG).

Dataset

Dataset used Market1501

Dataset size: 145.9MB, 32,217 images
- Train: 12,936 images
- Query: 3,368 images
- Gallery: 15,913 images

Dataset used DukeMTMC-reID

Dataset size: 146.1MB, 36,411 images
- Train: 16,522 images
- Query: 2,228 images
- Gallery: 17,661 images

Dataset used CUHK03

Dataset size: 1.8GB, 14,097 images
- Train: 7,365 images
- Query: 1,400 images
- Gallery: 5,332 images

Dataset used MSMT17, extraction code: yf3z

Dataset size: 2.6GB, 124,068 images
- Train: 30,248 images
- Query: 11,659 images
- Gallery: 82,161 images

In this project, the file organization is recommended as below:

.
├──datasets
   ├──market1501
      ├──Market-1501-v15.09.15
         ├──bounding_box_train
         ├──query
         ├──bounding_box_test
   ├──dukemtmc-reid
      ├──DukeMTMC-reID
         ├──bounding_box_train
         ├──query
         ├──bounding_box_test
   ├──cuhk03
      ├──cuhk03_release
         ├──cuhk-03.mat
         ├──cuhk03_new_protocol_config_labeled.mat
         ├──cuhk03_new_protocol_config_detected.mat
   ├──msmt17
      ├──MSMT17_V1
         ├──train
         ├──test
         ├──list_val.txt
         ├──list_train.txt
         ├──list_query.txt
         ├──list_query.txt

Features

Environment Requirements

Hardware（Ascend）
- Prepare hardware environment with Ascend processor.
Framework
- MindSpore
For more information, please check the resources below：
- MindSpore Tutorials
- MindSpore Python API

Script description

Script and sample code

.
└─osnet
  ├─README.md
  ├─scripts
    ├─run_train_standalone_ascend.sh            # launch standalone training with ascend platform(1p)
    ├─run_train_distribute_ascend.sh            # launch distributed training with ascend platform(8p)
    └─run_eval_ascend.sh                        # launch evaluating with ascend platform
  ├─src
    ├─cross_entropy_loss.py                     # cross entropy loss
    ├─dataset.py                                # data preprocessing
    ├─dataset_define.py                         # dataset preprocessing
    ├─lr_generator.py                           # learning rate scheduler
    └─osnet.py                                  # network definition
  ├─eval.py                                     # eval net
  ├─export.py                                   # export mindir for ascend 310
  ├─preprocessing.py                            # preprocessing data for ascend 310
  ├─postprocessing.py                           # calculate metrics for ascend 310
  └─train.py                                    # train net

Training process

Usage

Ascend:

# Add data set path, for  example
data_path:/home/osnet/datasets

# distribute training example(8p)
bash run_train_distribute_ascend.sh [RANK_TABLE_FILE] [DATASET] [PRETRAINED_CKPT_PATH](optional)
# example: bash run_distribute_train_ascend.sh ./hccl_8p.json market1501 /home/osnet/checkpoint/market1501/osnet-240_101.ckpt

# standalone training
bash run_train_standalone_ascend.sh [DATASET] [DEVICE_ID] [PRETRAINED_CKPT_PATH](optional)
# example: bash run_train_standalone_ascend.sh market1501 0 /home/osnet/checkpoint/market1501/osnet-240_101.ckpt

# evaluation:
bash run_eval_ascend.sh [DATASET] [CHECKPOINT_PATH] [DEVICE_ID]
# example: bash run_eval_ascend.sh market1501 /home/osnet/scripts/output/checkpoint/market1501/osnet-240_101.ckpt 0

Notes:
RANK_TABLE_FILE can refer to Link , and the device_ip can be got as Link. For large models like InceptionV4, it's better to export an external environment variable export HCCL_CONNECT_TIMEOUT=600 to extend hccl connection checking time from the default 120 seconds to 600 seconds. Otherwise, the connection could be timeout since compiling time increases with the growth of model size.

This is processor cores binding operation regarding the device_num and total processor numbers. If you are not expect to do it, remove the operations taskset in scripts/run_train_distribute_ascend.sh

The PRETRAINED_CKPT_PATH should be a checkpoint saved in the training process on ascend, it will resume from the checkpoint and continue to train.

Launch

Training needs to load the parameters pre-trained on ImageNet. You can download the checkpoint file on link, the extraction code is 1961. After downloading, put it in the ./model_utils folder. You can also download the .pth file pre-trained under Pytorch here, and convert it to .ckpt file through ./model_utils/pth_to_ckpt.py.

Running on local server

Modify the dataset path data_path in osnet_config.yaml.

# training example
shell:
  Ascend:
    # distribute training example(8p)
    bash run_train_distribute_ascend.sh [RANK_TABLE_FILE] [DATASET] [PRETRAINED_CKPT_PATH](optional)
    # example: bash run_distribute_train_ascend.sh ./hccl_8p.json market1501 /home/osnet/checkpoint/market1501/osnet-240_101.ckpt

    # standalone training
    bash run_train_standalone_ascend.sh [DATASET] [DEVICE_ID] [PRETRAINED_CKPT_PATH](optional)
    # example: bash run_train_standalone_ascend.sh market1501 0 /home/osnet/checkpoint/market1501/osnet-240_101.ckpt

Running on ModelArts

ModelArts (If you want to run in modelarts, please check the official documentation of modelarts, and you can start training as follows)

# Train 1p with Ascend
# (1) Perform a or b.
#       a. Set "enable_modelarts=True" on osnet_config.yaml file.
#          Set "data_path='/cache/data'" on osnet_config.yaml file.
#          Set "load_path='/cache/checkpoint/'" on osnet_config.yaml file.
#          (optional)Set "checkpoint_url='s3://dir_to_your_pretrained/'" on osnet_config.yaml file.
#          Set other parameters on osnet_config.yaml file you need.
#       b. Add "enable_modelarts=True" on the website UI interface.
#          Add "data_path='/cache/data'" on the website UI interface.
#          Add "load_path='/cache/checkpoint/'" on the website UI interface.
#          (optional)Add "checkpoint_url='s3://dir_to_your_pretrained/'" on the website UI interface.
#          Add other parameters on the website UI interface.
# (2) Prepare model code.
# (3) Upload or copy your pretrained model to S3 bucket if you want to finetune.
# (4) Upload the original mnist_data dataset to S3 bucket.
# (5) Set the code directory to "/path/osnet" on the website UI interface.
# (6) Set the startup file to "train.py" on the website UI interface.
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
# (8) Create your job.

# Train 8p with Ascend
# (1) Perform a or b.
#       a. Set "enable_modelarts=True" on osnet_config.yaml file.
#          Set "run_distribute=True" on osnet_config.yaml file.
#          Set "data_path='/cache/data'" on osnet_config.yaml file.
#          Set "load_path='/cache/checkpoint/'" on osnet_config.yaml file.
#          (optional)Set "checkpoint_url='s3://dir_to_your_pretrained/'" on osnet_config.yaml file.
#          Set other parameters on osnet_config.yaml file you need.
#       b. Add "enable_modelarts=True" on the website UI interface.
#          Add "run_distribute=True" on the website UI interface.
#          Add "data_path='/cache/data'" on the website UI interface.
#          Add "load_path='/cache/checkpoint/'" on the website UI interface.
#          (optional)Add "checkpoint_url='s3://dir_to_your_pretrained/'" on the website UI interface.
#          Add other parameters on the website UI interface.
# (2) Prepare model code.
# (3) Upload or copy your pretrained model to S3 bucket if you want to finetune.
# (4) Upload the original mnist_data dataset to S3 bucket.
# (5) Set the code directory to "/path/osnet" on the website UI interface.
# (6) Set the startup file to "train.py" on the website UI interface.
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
# (8) Create your job.

Result

Checkpoints will be stored at ./output/checkpoint by default, and training log will be redirected to ./train.log

(8p)
...
epoch: 90 step: 12, loss is 1.0532682
epoch time: 1779.959 ms, per step time: 148.330 ms
epoch: 91 step: 12, loss is 1.0837934
epoch time: 2229.157 ms, per step time: 185.763 ms
epoch: 92 step: 12, loss is 1.0674114
epoch time: 1607.048 ms, per step time: 133.921 ms
epoch: 93 step: 12, loss is 1.0512338
epoch time: 1764.129 ms, per step time: 147.011 ms
epoch: 94 step: 12, loss is 1.0647253
epoch time: 1782.682 ms, per step time: 148.557 ms
epoch: 95 step: 12, loss is 1.0884073
epoch time: 1755.473 ms, per step time: 146.289 ms
...
(1p)
...
epoch: 245 step: 129, loss is 1.0219252
epoch time: 23841.607 ms, per step time: 184.819 ms
epoch: 246 step: 129, loss is 1.0082468
epoch time: 23109.856 ms, per step time: 179.146 ms
epoch: 247 step: 129, loss is 1.0107011
epoch time: 24086.062 ms, per step time: 186.714 ms
epoch: 248 step: 129, loss is 1.0113524
epoch time: 22814.048 ms, per step time: 176.853 ms
epoch: 249 step: 129, loss is 1.0196884
epoch time: 23689.971 ms, per step time: 183.643 ms
epoch: 250 step: 129, loss is 1.0096855
epoch time: 24795.141 ms, per step time: 192.210 ms
...

Eval process

Usage

You can start evaluating using python or shell scripts. The usage of shell scripts as follows:

Ascend:

  bash run_eval_ascend.sh [DATASET] [CHECKPOINT_PATH] [DEVICE_ID]
  # example: bash run_eval_ascend.sh market1501 /home/osnet/scripts/output/checkpoint/market1501/osnet-240_101.ckpt 0

Launch

Running on local server.

Modify the dataset path data_path in osnet_config.yaml and run:

# eval example
shell:
  Ascend:
    bash run_eval_ascend.sh [DATASET] [CHECKPOINT_PATH] [DEVICE_ID]
    # example: bash run_eval_ascend.sh market1501 /home/osnet/scripts/output/checkpoint/market1501/osnet-240_101.ckpt 0

Running on ModelArts.

# Eval 1p with Ascend
# (1) Perform a or b.
#       a. Set "enable_modelarts=True" on config_config.yaml file.
#          Set "data_path='/cache/data'" on config_config.yaml file.
#          Set "checkpoint_url='s3://dir_to_your_trained_model/'" on osnet_config.yaml file.
#          Set "checkpoint_file_path='/cache/checkpoint/model.ckpt'" on osnet_config.yaml file.
#          Set other parameters on default_config.yaml file you need.
#       b. Add "enable_modelarts=True" on the website UI interface.
#          Add "data_path='/cache/data'" on the website UI interface.
#          Add "checkpoint_url='s3://dir_to_your_trained_model/'" on the website UI interface.
#          Add "checkpoint_file_path='/cache/checkpoint/model.ckpt'" on the website UI interface.
#          Add other parameters on the website UI interface.
# (2) Prepare model code.
# (3) Upload or copy your trained model to S3 bucket.
# (4) Upload the original mnist_data dataset to S3 bucket.
# (5) Set the code directory to "/path/lenet" on the website UI interface.
# (6) Set the startup file to "eval.py" on the website UI interface.
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
# (8) Create your job.

checkpoint can be produced in training process.

Result

Evaluation result will be stored in the output file of evaluation script, you can find result like the followings in eval.log.

** Results **
ckpt=/data/osnet/osnet-240_202.ckpt
mAP: 77.6%
CMC curve
Rank-1  : 91.5%
Rank-5  : 94.8%
Rank-10 : 96.1%
Rank-20 : 96.8%

Inference Process

Export MindIR

Before export model, you must modify the config file osnet_config.yaml , The config items you should modify are data_path, target, batch_size_test and ckpt_file.
Current batch_size_test can only be set to 1.

python export.py --data_path [DATA_PATH] --taget [TARGET] --batch_size_test 1 --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]

The data_path, target, batch_size_test and ckpt_file parameter is required, file_name defaults to osnet,file_format should be in ["AIR", "MINDIR"]

Infer on Ascend310

Before performing inference, the mindir file must be exported by export.py script. We only provide an example of inference using MINDIR model.

# Ascend310 inference
bash run_infer_310.sh [MINDIR_PATH] [DATASET] [DATA_PATH] [DEVICE_ID]

result

Inference result is saved in current path, you can find result like this in acc.log file.

** Results **
Dataset:market1501
mAP: 83.7%
CMC curve
Rank-1  : 93.9%
Rank-5  : 95.8%
Rank-10 : 97.0%
Rank-20 : 97.4%

Model description

Performance

Training Performance

OSNet train on Market1501

Parameters	OSNet
Resource	Ascend 910; CPU 2.60GHz, 192cores; Memory 755G; OS Euler2.8
uploaded Date	18/12/2021 (month/day/year)
MindSpore Version	1.5.0
Dataset	Market1501
Training Parameters	epoch=250, batch_size = 128, lr=0.001
Optimizer	Adam
Loss Function	Label Smoothing Cross Entropy Loss
outputs	probability
Speed	1pc: 175.741 ms/step; 8pcs: 181.027 ms/step
Checkpoint for Fine tuning	29.4M (.ckpt file)

OSNet train on DukeMTMC-reID

Parameters	OSNet
Resource	Ascend 910; CPU 2.60GHz, 192cores; Memory 755G; OS Euler2.8
uploaded Date	18/12/2021 (month/day/year)
MindSpore Version	1.5.0
Dataset	DukeMTMC-reID
Training Parameters	epoch=250, batch_size = 128, lr=0.001
Optimizer	Adam
Loss Function	Label Smoothing Cross Entropy Loss
outputs	probability
Speed	1pc: 175.904 ms/step; 8pcs: 180.340 ms/step
Checkpoint for Fine tuning	29.11M (.ckpt file)

OSNet train on MSMT17

Parameters	OSNet
Resource	Ascend 910; CPU 2.60GHz, 192cores; Memory 755G; OS Euler2.8
uploaded Date	18/12/2021 (month/day/year)
MindSpore Version	1.5.0
Dataset	MSMT17
Training Parameters	epoch=250, batch_size = 128, lr=0.001
Optimizer	Adam
Loss Function	Label Smoothing Cross Entropy Loss
outputs	probability
Speed	1pc: 183.783 ms/step; 8pcs: 180.458 ms/step
Checkpoint for Fine tuning	31.12M (.ckpt file)

Inference Performance

OSNet on Market1501

Parameters	Ascend
Resource	Ascend 910; OS Euler2.8
Uploaded Date	18/12/2021 (month/day/year)
MindSpore Version	1.5.0
Dataset	Market1501
batch_size	300
outputs	probability
mAP	1pc: 82.4%; 8pcs:83.7%
Rank-1	1pc: 93.3%; 8pcs:93.9%

OSNet on DukeMTMC-reID

Parameters	Ascend
Resource	Ascend 910; OS Euler2.8
Uploaded Date	18/12/2021 (month/day/year)
MindSpore Version	1.5.0
Dataset	DukeMTMC-reID
batch_size	300
outputs	probability
mAP	1pc: 69.8%; 8pcs:74.6%
Rank-1	1pc: 86.2%; 8pcs:89.2%

OSNet on MSMT17

Parameters	Ascend
Resource	Ascend 910; OS Euler2.8
Uploaded Date	18/12/2021 (month/day/year)
MindSpore Version	1.5.0
Dataset	MSMT17
batch_size	300
outputs	probability
mAP	1pc: 43.1%; 8pcs:50.0%
Rank-1	1pc: 71.5%; 8pcs:77.7%

Description of Random Situation

We set seed to 1 in train.py.

ModelZoo Homepage

Please check the official homepage.

21 KiB Raw Permalink Blame History

OSNet for Ascend

Usage

Launch

Result

Usage

Launch

Result

Inference Process

Infer on Ascend310

result

Training Performance

OSNet train on Market1501

OSNet train on DukeMTMC-reID

OSNet train on MSMT17

Inference Performance

OSNet on Market1501

OSNet on DukeMTMC-reID

OSNet on MSMT17

21 KiB

Raw Permalink Blame History