OSNet for Ascend
OSNet is an efficient and accurate neural network architecture for person re-identification. The method proposed a novel CNN architecture designed for learning omni-scale feature representations. The feature representations are captured by multiple convolutional streams with different receptive field sizes and fused by channel-wise weights that are generated by a unified aggregation gate(AG). This idea was proposed in the paper "Omni-Scale Feature Learning for Person Re-Identification.", published in 2019.
Paper Kaiyang Zhou, Yongxin Yang, Andrea Cavallaro, Tao Xiang, University of Surrey, Queen Mary University of London Samsung AI Center, Cambridge, Published in IEEE 2019.
The network structure can be decomposed into two parts: feature extraction and feature merging. Use multiple convolutional streams with different receptive field sizes in the feature extraction layer to obtain feature map. In the feature merging part, the resulting multi-scale feature maps are dynamically fused by channel-wise weights that are generated by a unified aggregation gate(AG).
Dataset used Market1501
- Dataset size: 145.9MB, 32,217 images
- Train: 12,936 images
- Query: 3,368 images
- Gallery: 15,913 images
Dataset used DukeMTMC-reID
- Dataset size: 146.1MB, 36,411 images
- Train: 16,522 images
- Query: 2,228 images
- Gallery: 17,661 images
Dataset used CUHK03
- Dataset size: 1.8GB, 14,097 images
- Train: 7,365 images
- Query: 1,400 images
- Gallery: 5,332 images
Dataset used MSMT17, extraction code: yf3z
- Dataset size: 2.6GB, 124,068 images
- Train: 30,248 images
- Query: 11,659 images
- Gallery: 82,161 images
In this project, the file organization is recommended as below:
.
├──datasets
├──market1501
├──Market-1501-v15.09.15
├──bounding_box_train
├──query
├──bounding_box_test
├──dukemtmc-reid
├──DukeMTMC-reID
├──bounding_box_train
├──query
├──bounding_box_test
├──cuhk03
├──cuhk03_release
├──cuhk-03.mat
├──cuhk03_new_protocol_config_labeled.mat
├──cuhk03_new_protocol_config_detected.mat
├──msmt17
├──MSMT17_V1
├──train
├──test
├──list_val.txt
├──list_train.txt
├──list_query.txt
├──list_query.txt
- Hardware(Ascend)
- Prepare hardware environment with Ascend processor.
- Framework
- For more information, please check the resources below:
.
└─osnet
├─README.md
├─scripts
├─run_train_standalone_ascend.sh # launch standalone training with ascend platform(1p)
├─run_train_distribute_ascend.sh # launch distributed training with ascend platform(8p)
└─run_eval_ascend.sh # launch evaluating with ascend platform
├─src
├─cross_entropy_loss.py # cross entropy loss
├─dataset.py # data preprocessing
├─dataset_define.py # dataset preprocessing
├─lr_generator.py # learning rate scheduler
└─osnet.py # network definition
├─eval.py # eval net
├─export.py # export mindir for ascend 310
├─preprocessing.py # preprocessing data for ascend 310
├─postprocessing.py # calculate metrics for ascend 310
└─train.py # train net
Usage
# Add data set path, for example
data_path:/home/osnet/datasets
# distribute training example(8p)
bash run_train_distribute_ascend.sh [RANK_TABLE_FILE] [DATASET] [PRETRAINED_CKPT_PATH](optional)
# example: bash run_distribute_train_ascend.sh ./hccl_8p.json market1501 /home/osnet/checkpoint/market1501/osnet-240_101.ckpt
# standalone training
bash run_train_standalone_ascend.sh [DATASET] [DEVICE_ID] [PRETRAINED_CKPT_PATH](optional)
# example: bash run_train_standalone_ascend.sh market1501 0 /home/osnet/checkpoint/market1501/osnet-240_101.ckpt
# evaluation:
bash run_eval_ascend.sh [DATASET] [CHECKPOINT_PATH] [DEVICE_ID]
# example: bash run_eval_ascend.sh market1501 /home/osnet/scripts/output/checkpoint/market1501/osnet-240_101.ckpt 0
Notes:
RANK_TABLE_FILE can refer to Link , and the device_ip can be got as Link. For large models like InceptionV4, it's better to export an external environment variable export HCCL_CONNECT_TIMEOUT=600
to extend hccl connection checking time from the default 120 seconds to 600 seconds. Otherwise, the connection could be timeout since compiling time increases with the growth of model size.
This is processor cores binding operation regarding the device_num
and total processor numbers. If you are not expect to do it, remove the operations taskset
in scripts/run_train_distribute_ascend.sh
The PRETRAINED_CKPT_PATH
should be a checkpoint saved in the training process on ascend, it will resume from the checkpoint and continue to train.
Launch
-
Training needs to load the parameters pre-trained on ImageNet. You can download the checkpoint file on link, the extraction code is 1961
. After downloading, put it in the ./model_utils
folder. You can also download the .pth
file pre-trained under Pytorch here, and convert it to .ckpt
file through ./model_utils/pth_to_ckpt.py
.
-
Running on local server
- Modify the dataset path
data_path
in osnet_config.yaml
.
# training example
shell:
Ascend:
# distribute training example(8p)
bash run_train_distribute_ascend.sh [RANK_TABLE_FILE] [DATASET] [PRETRAINED_CKPT_PATH](optional)
# example: bash run_distribute_train_ascend.sh ./hccl_8p.json market1501 /home/osnet/checkpoint/market1501/osnet-240_101.ckpt
# standalone training
bash run_train_standalone_ascend.sh [DATASET] [DEVICE_ID] [PRETRAINED_CKPT_PATH](optional)
# example: bash run_train_standalone_ascend.sh market1501 0 /home/osnet/checkpoint/market1501/osnet-240_101.ckpt
-
Running on ModelArts
- ModelArts (If you want to run in modelarts, please check the official documentation of modelarts, and you can start training as follows)
# Train 1p with Ascend
# (1) Perform a or b.
# a. Set "enable_modelarts=True" on osnet_config.yaml file.
# Set "data_path='/cache/data'" on osnet_config.yaml file.
# Set "load_path='/cache/checkpoint/'" on osnet_config.yaml file.
# (optional)Set "checkpoint_url='s3://dir_to_your_pretrained/'" on osnet_config.yaml file.
# Set other parameters on osnet_config.yaml file you need.
# b. Add "enable_modelarts=True" on the website UI interface.
# Add "data_path='/cache/data'" on the website UI interface.
# Add "load_path='/cache/checkpoint/'" on the website UI interface.
# (optional)Add "checkpoint_url='s3://dir_to_your_pretrained/'" on the website UI interface.
# Add other parameters on the website UI interface.
# (2) Prepare model code.
# (3) Upload or copy your pretrained model to S3 bucket if you want to finetune.
# (4) Upload the original mnist_data dataset to S3 bucket.
# (5) Set the code directory to "/path/osnet" on the website UI interface.
# (6) Set the startup file to "train.py" on the website UI interface.
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
# (8) Create your job.
# Train 8p with Ascend
# (1) Perform a or b.
# a. Set "enable_modelarts=True" on osnet_config.yaml file.
# Set "run_distribute=True" on osnet_config.yaml file.
# Set "data_path='/cache/data'" on osnet_config.yaml file.
# Set "load_path='/cache/checkpoint/'" on osnet_config.yaml file.
# (optional)Set "checkpoint_url='s3://dir_to_your_pretrained/'" on osnet_config.yaml file.
# Set other parameters on osnet_config.yaml file you need.
# b. Add "enable_modelarts=True" on the website UI interface.
# Add "run_distribute=True" on the website UI interface.
# Add "data_path='/cache/data'" on the website UI interface.
# Add "load_path='/cache/checkpoint/'" on the website UI interface.
# (optional)Add "checkpoint_url='s3://dir_to_your_pretrained/'" on the website UI interface.
# Add other parameters on the website UI interface.
# (2) Prepare model code.
# (3) Upload or copy your pretrained model to S3 bucket if you want to finetune.
# (4) Upload the original mnist_data dataset to S3 bucket.
# (5) Set the code directory to "/path/osnet" on the website UI interface.
# (6) Set the startup file to "train.py" on the website UI interface.
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
# (8) Create your job.
Result
Checkpoints will be stored at ./output/checkpoint
by default, and training log will be redirected to ./train.log
(8p)
...
epoch: 90 step: 12, loss is 1.0532682
epoch time: 1779.959 ms, per step time: 148.330 ms
epoch: 91 step: 12, loss is 1.0837934
epoch time: 2229.157 ms, per step time: 185.763 ms
epoch: 92 step: 12, loss is 1.0674114
epoch time: 1607.048 ms, per step time: 133.921 ms
epoch: 93 step: 12, loss is 1.0512338
epoch time: 1764.129 ms, per step time: 147.011 ms
epoch: 94 step: 12, loss is 1.0647253
epoch time: 1782.682 ms, per step time: 148.557 ms
epoch: 95 step: 12, loss is 1.0884073
epoch time: 1755.473 ms, per step time: 146.289 ms
...
(1p)
...
epoch: 245 step: 129, loss is 1.0219252
epoch time: 23841.607 ms, per step time: 184.819 ms
epoch: 246 step: 129, loss is 1.0082468
epoch time: 23109.856 ms, per step time: 179.146 ms
epoch: 247 step: 129, loss is 1.0107011
epoch time: 24086.062 ms, per step time: 186.714 ms
epoch: 248 step: 129, loss is 1.0113524
epoch time: 22814.048 ms, per step time: 176.853 ms
epoch: 249 step: 129, loss is 1.0196884
epoch time: 23689.971 ms, per step time: 183.643 ms
epoch: 250 step: 129, loss is 1.0096855
epoch time: 24795.141 ms, per step time: 192.210 ms
...
Usage
You can start evaluating using python or shell scripts. The usage of shell scripts as follows:
bash run_eval_ascend.sh [DATASET] [CHECKPOINT_PATH] [DEVICE_ID]
# example: bash run_eval_ascend.sh market1501 /home/osnet/scripts/output/checkpoint/market1501/osnet-240_101.ckpt 0
Launch
-
Running on local server.
- Modify the dataset path
data_path
in osnet_config.yaml
and run:
# eval example
shell:
Ascend:
bash run_eval_ascend.sh [DATASET] [CHECKPOINT_PATH] [DEVICE_ID]
# example: bash run_eval_ascend.sh market1501 /home/osnet/scripts/output/checkpoint/market1501/osnet-240_101.ckpt 0
-
Running on ModelArts.
# Eval 1p with Ascend
# (1) Perform a or b.
# a. Set "enable_modelarts=True" on config_config.yaml file.
# Set "data_path='/cache/data'" on config_config.yaml file.
# Set "checkpoint_url='s3://dir_to_your_trained_model/'" on osnet_config.yaml file.
# Set "checkpoint_file_path='/cache/checkpoint/model.ckpt'" on osnet_config.yaml file.
# Set other parameters on default_config.yaml file you need.
# b. Add "enable_modelarts=True" on the website UI interface.
# Add "data_path='/cache/data'" on the website UI interface.
# Add "checkpoint_url='s3://dir_to_your_trained_model/'" on the website UI interface.
# Add "checkpoint_file_path='/cache/checkpoint/model.ckpt'" on the website UI interface.
# Add other parameters on the website UI interface.
# (2) Prepare model code.
# (3) Upload or copy your trained model to S3 bucket.
# (4) Upload the original mnist_data dataset to S3 bucket.
# (5) Set the code directory to "/path/lenet" on the website UI interface.
# (6) Set the startup file to "eval.py" on the website UI interface.
# (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
# (8) Create your job.
checkpoint can be produced in training process.
Result
Evaluation result will be stored in the output file of evaluation script, you can find result like the followings in eval.log
.
** Results **
ckpt=/data/osnet/osnet-240_202.ckpt
mAP: 77.6%
CMC curve
Rank-1 : 91.5%
Rank-5 : 94.8%
Rank-10 : 96.1%
Rank-20 : 96.8%
Inference Process
Before export model, you must modify the config file osnet_config.yaml , The config items you should modify are data_path, target, batch_size_test and ckpt_file.
Current batch_size_test can only be set to 1.
python export.py --data_path [DATA_PATH] --taget [TARGET] --batch_size_test 1 --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
The data_path, target, batch_size_test and ckpt_file parameter is required, file_name
defaults to osnet
,file_format
should be in ["AIR", "MINDIR"]
Infer on Ascend310
Before performing inference, the mindir file must be exported by export.py
script. We only provide an example of inference using MINDIR model.
# Ascend310 inference
bash run_infer_310.sh [MINDIR_PATH] [DATASET] [DATA_PATH] [DEVICE_ID]
result
Inference result is saved in current path, you can find result like this in acc.log file.
** Results **
Dataset:market1501
mAP: 83.7%
CMC curve
Rank-1 : 93.9%
Rank-5 : 95.8%
Rank-10 : 97.0%
Rank-20 : 97.4%
Training Performance
OSNet train on Market1501
Parameters |
OSNet |
Resource |
Ascend 910; CPU 2.60GHz, 192cores; Memory 755G; OS Euler2.8 |
uploaded Date |
18/12/2021 (month/day/year) |
MindSpore Version |
1.5.0 |
Dataset |
Market1501 |
Training Parameters |
epoch=250, batch_size = 128, lr=0.001 |
Optimizer |
Adam |
Loss Function |
Label Smoothing Cross Entropy Loss |
outputs |
probability |
Speed |
1pc: 175.741 ms/step; 8pcs: 181.027 ms/step |
Checkpoint for Fine tuning |
29.4M (.ckpt file) |
OSNet train on DukeMTMC-reID
Parameters |
OSNet |
Resource |
Ascend 910; CPU 2.60GHz, 192cores; Memory 755G; OS Euler2.8 |
uploaded Date |
18/12/2021 (month/day/year) |
MindSpore Version |
1.5.0 |
Dataset |
DukeMTMC-reID |
Training Parameters |
epoch=250, batch_size = 128, lr=0.001 |
Optimizer |
Adam |
Loss Function |
Label Smoothing Cross Entropy Loss |
outputs |
probability |
Speed |
1pc: 175.904 ms/step; 8pcs: 180.340 ms/step |
Checkpoint for Fine tuning |
29.11M (.ckpt file) |
OSNet train on MSMT17
Parameters |
OSNet |
Resource |
Ascend 910; CPU 2.60GHz, 192cores; Memory 755G; OS Euler2.8 |
uploaded Date |
18/12/2021 (month/day/year) |
MindSpore Version |
1.5.0 |
Dataset |
MSMT17 |
Training Parameters |
epoch=250, batch_size = 128, lr=0.001 |
Optimizer |
Adam |
Loss Function |
Label Smoothing Cross Entropy Loss |
outputs |
probability |
Speed |
1pc: 183.783 ms/step; 8pcs: 180.458 ms/step |
Checkpoint for Fine tuning |
31.12M (.ckpt file) |
Inference Performance
OSNet on Market1501
Parameters |
Ascend |
Resource |
Ascend 910; OS Euler2.8 |
Uploaded Date |
18/12/2021 (month/day/year) |
MindSpore Version |
1.5.0 |
Dataset |
Market1501 |
batch_size |
300 |
outputs |
probability |
mAP |
1pc: 82.4%; 8pcs:83.7% |
Rank-1 |
1pc: 93.3%; 8pcs:93.9% |
OSNet on DukeMTMC-reID
Parameters |
Ascend |
Resource |
Ascend 910; OS Euler2.8 |
Uploaded Date |
18/12/2021 (month/day/year) |
MindSpore Version |
1.5.0 |
Dataset |
DukeMTMC-reID |
batch_size |
300 |
outputs |
probability |
mAP |
1pc: 69.8%; 8pcs:74.6% |
Rank-1 |
1pc: 86.2%; 8pcs:89.2% |
OSNet on MSMT17
Parameters |
Ascend |
Resource |
Ascend 910; OS Euler2.8 |
Uploaded Date |
18/12/2021 (month/day/year) |
MindSpore Version |
1.5.0 |
Dataset |
MSMT17 |
batch_size |
300 |
outputs |
probability |
mAP |
1pc: 43.1%; 8pcs:50.0% |
Rank-1 |
1pc: 71.5%; 8pcs:77.7% |
We set seed to 1 in train.py.
Please check the official homepage.