Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
shiyutang 74a2b2fb8e | 1 year ago | |
---|---|---|
.. | ||
README.md | 1 year ago | |
pp_mobileseg_base_ade20k_512x512_80k.yml | 1 year ago | |
pp_mobileseg_base_cityscapes_1024x512_80k.yml | 1 year ago | |
pp_mobileseg_tiny_ade20k_512x512_80k.yml | 1 year ago | |
pp_mobileseg_tiny_cityscapes_1024x512_80k.yml | 1 year ago |
Shiyu Tang, Ting Sun, Juncai Peng, Guowei Chen, Yuying Hao, Manhui Lin, Zhihong Xiao, Jiangbin You, Yi Liu. PP-MobileSeg: Explore the Fast and Accurate Semantic Segmentation Model on Mobile Devices. https://arxiv.org/abs/2304.05152
With the success of transformers in computer vision, several attempts have been made to adapt transformers to mobile devices. However, their performance is not satisfied for some real world applications. Therefore, we propose PP-MobileSeg, a SOTA semantic segmentation model for mobile devices.
It is composed of three newly proposed parts, the strideformer backbone, the Aggregated Attention Module(AAM), and the Valid Interpolate Module(VIM):
Extensive experiments show that PP-MobileSeg achieves a superior params-accuracy-latency tradeoff compared to other SOTA methods.
Model | Backbone | Training Iters | Batchsize | Train Resolution | mIoU(%) | latency(ms)* | params(M) | Links |
---|---|---|---|---|---|---|---|---|
PP-MobileSeg-Base | StrideFormer-Base | 80000 | 32 | 512x512 | 41.57% | 265.5 | 5.62 | config|model|log|vdl|exported model |
PP-MobileSeg-Tiny | StrideFormer-Tiny | 80000 | 32 | 512x512 | 36.39% | 215.3 | 1.61 | config|model|log|vdl|exported model |
Model | Backbone | mIoU(%) | latency(ms)* | params(M) |
---|---|---|---|---|
LR-ASPP | MobileNetV3_large_x1_0 | 33.10 | 730.9 | 3.20 |
MobileSeg-Base | MobileNetV3_large_x1_0 | 33.26 | 391.5 | 2.85 |
TopFormer-Tiny | TopTransformer-Tiny | 32.46 | 490.3 | 1.41 |
SeaFormer-Tiny | SeaFormer-Tiny | 35.00 | 459.0 | 1.61 |
PP-MobileSeg-Tiny | StrideFormer-Tiny | 36.39 | 215.3 | 1.44 |
TopFormer-Base | TopTransformer-Base | 38.28 | 480.6 | 5.13 |
SeaFormer-Base | SeaFormer-Base | 40.07** | 465.4 | 8.64 |
PP-MobileSeg-Base | StrideFormer-Base | 41.57 | 265.5 | 5.62 |
Model | Backbone | Train Resolution | mIoU(%) | latency(ms)* | params(M) | Links |
---|---|---|---|---|---|---|
baseline | Seaformer-Base | 512x512 | 40.00% | 465.6 | 8.27 | model|log|vdl|exported model |
+VIM | Seaformer-Base | 512x512 | 40.07% | 234.6 | 8.17 | model|log|vdl|exported model |
+VIM+StrideFormer | StrideFormer-Base | 512x512 | 40.98% | 235.1 | 5.54 | model|log|vdl|exported model |
+VIM+StrideFormer+AAM | StrideFormer-Base | 512x512 | 41.57% | 265.5 | 5.62 | model|log|vdl|exported model |
* Note that the latency is test with the final argmax operator using PaddleLite on xiaomi9 (Snapdragon 855 CPU) with single thread and 512x512 as input shape. Therefore the output of model is the segment result with single channel rather then probability logits. Inspired by the ineffectiveness of the final argmax operator that greatly increase the overall latency, we designed VIM to significantly decrease the latency.
** The accuracy is reported based on self-trained reproduced result.
PaddleSeg/data
├── ADEChallengeData2016
│ ├── ade20k_150_embedding_42.npy
│ ├── annotations
│ ├── annotations_detectron2
│ ├── images
│ ├── objectInfo150.txt
│ └── sceneCategories.txt
You can start training by assign the tools/train.py
with config files, the config files are under PaddleSeg/configs/pp_mobileseg
. Details about training are under training guide. You can find the trained models under Paddleseg/save/dir/best_model/model.pdparams
export CUDA_VISIBLE_DEVICES=0,1
python3 -m paddle.distributed.launch tools/train.py \
--config configs/pp_mobileseg/pp_mobileseg_base_ade20k_512x512_80k.yml \
--save_dir output/pp_mobileseg_base \
--save_interval 1000 \
--num_workers 4 \
--log_iters 100 \
--use_ema \
--do_eval \
--use_vdl
With the trained model on hand, you can verify the model's accuracy through evaluation. Details about evaluation are under evaluation guide.
python -m paddle.distributed.launch tools/val.py \
--config configs/pp_mobileseg/pp_mobileseg_base_ade20k_512x512_80k.yml \
--model_path output/pp_mobileseg_base/best_model/model.pdparams
We deploy the model on mobile devices for inference. To do that, we need to export the model and use PaddleLite to inference on mobile devices. You can also refer to lite deploy guide for details of PaddleLite deployment.
Run the following command to make sure you are ready:
adb devices
# The following information will show if you are good to go:
List of devices attached
017QXM19C1000664 device
The model needs to be transferred from dynamic graph to static graph for PaddleLite inference. In this step, we can use VIM
to speed the model up. You only need to change model::upsample
to vim
in the config file, and the exported model can be found on the PaddleSeg/save/dir
python tools/export.py \
--config configs/pp_mobileseg/pp_mobileseg_base_ade20k_512x512_80k.yml \
--save_dir output/pp_mobileseg_base \
--input_shape 1 3 512 512 \ # The model is set to infer one image with this input shape, feel free to suit this to your dataset.
--output_op none # If do not use VIM, you need to set this to argmax to get the final prediction rather than logits.
Speed_test_dir
├── models_dir
│ ├── pp_mobileseg_base # Files under this directory is generated through exportation
│ │ ├── model.pdmodel
│ │ ├── mdoel.pdiparams
│ │ ├── model.pdiparams.info
│ │ └── deploy.yaml
│ ├── pp_mobileseg_tiny
│ │ ├── model.pdmodel
│ │ ├── mdoel.pdiparams
│ │ ├── model.pdiparams.info
│ │ └── deploy.yaml
├── benchmark_bin # The complied testscript of PaddleLite, which is in the tool zipfile.
├── image1.txt # The txt file that stores the value of resized and normalized image
└── gen_val_txt.py # You can use this script to generate the image1.txt for your test image
sh benchmark.sh benchmark_bin models_dir test_result.txt image1.txt
The test result on our PP-MobileSeg-Base is as following:
-----------------Model=MV3_4stage_AAMSx8_valid_0321 Threads=1-------------------------
Delete previous optimized model: /data/local/tmp/seg_benchmark/models_0321/MV3_4stage_AAMSx8_valid_0321/opt.nb
---------- Opt Info ----------
Load paddle model from /data/local/tmp/seg_benchmark/models_0321/MV3_4stage_AAMSx8_valid_0321/model.pdmodel and /data/local/tmp/seg_benchmark/models_0321/MV3_4stage_AAMSx8_valid_0321/model.pdiparams
Save optimized model to /data/local/tmp/seg_benchmark/models_0321/MV3_4stage_AAMSx8_valid_0321/opt.nb
---------- Device Info ----------
Brand: Xiaomi
Device: cepheus
Model: MI 9
Android Version: 9
Android API Level: 28
---------- Model Info ----------
optimized_model_file: /data/local/tmp/seg_benchmark/models_0321/MV3_4stage_AAMSx8_valid_0321/opt.nb
input_data_path: /data/local/tmp/seg_benchmark/image1_norm.txt
input_shape: 1,3,512,512
output tensor num: 1
--- output tensor 0 ---
output shape(NCHW): 1 512 512
output tensor 0 elem num: 262144
output tensor 0 mean value: 1.18468e-44
output tensor 0 standard deviation: 2.52949e-44
---------- Runtime Info ----------
benchmark_bin version: e79b4b6
threads: 1
power_mode: 0
warmup: 20
repeats: 50
result_path:
---------- Backend Info ----------
backend: arm
cpu precision: fp32
---------- Perf Info ----------
Time(unit: ms):
init = 33.071
first = 314.619
min = 265.450
max = 271.217
avg = 267.246
No Description
Python Markdown Text Shell Java other
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》