Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
cc dbacad204a | 1 year ago | |
---|---|---|
.. | ||
README.md | 1 year ago | |
pp_liteseg_stdc1_camvid_960x720_10k.yml | 2 years ago | |
pp_liteseg_stdc1_camvid_960x720_10k_for_test.yml | 2 years ago | |
pp_liteseg_stdc1_cityscapes_1024x512_scale0.5_160k.yml | 2 years ago | |
pp_liteseg_stdc1_cityscapes_1024x512_scale0.75_160k.yml | 2 years ago | |
pp_liteseg_stdc1_cityscapes_1024x512_scale1.0_160k.yml | 2 years ago | |
pp_liteseg_stdc2_camvid_960x720_10k.yml | 2 years ago | |
pp_liteseg_stdc2_camvid_960x720_10k_for_test.yml | 2 years ago | |
pp_liteseg_stdc2_cityscapes_1024x512_scale0.5_160k.yml | 2 years ago | |
pp_liteseg_stdc2_cityscapes_1024x512_scale0.75_160k.yml | 2 years ago | |
pp_liteseg_stdc2_cityscapes_1024x512_scale1.0_160k.yml | 2 years ago |
Juncai Peng, Yi Liu, Shiyu Tang, Yuying Hao, Lutao Chu, Guowei Chen, Zewu Wu, Zeyu Chen, Zhiliang Yu, Yuning Du, Qingqing Dang,Baohua Lai, Qiwen Liu, Xiaoguang Hu, Dianhai Yu, Yanjun Ma. PP-LiteSeg: A Superior Real-Time Semantic Segmentation Model. https://arxiv.org/abs/2204.02681
We propose PP-LiteSeg, a novel lightweight model for the real-time semantic segmentation task. Specifically, we present a Flexible and Lightweight Decoder (FLD) to reduce computation overhead of previous decoder. To strengthen feature representations, we propose a Unified Attention Fusion Module (UAFM), which takes advantage of spatial and channel attention to produce a weight and then fuses the input features with the weight. Moreover, a Simple Pyramid Pooling Module (SPPM) is proposed to aggregate global context with low computation cost.
Prepare:
PaddleSeg/data
(Cityscapes, CamVid)
PaddleSeg/data
├── cityscapes
│ ├── gtFine
│ ├── infer.list
│ ├── leftImg8bit
│ ├── test.list
│ ├── train.list
│ ├── trainval.list
│ └── val.list
├── camvid
│ ├── annot
│ ├── images
│ ├── README.md
│ ├── test.txt
│ ├── train.txt
│ └── val.txt
Training:
The config files of PP-LiteSeg are under PaddleSeg/configs/pp_liteseg/
.
Based on the train.py
script, we set the config file and start training model.
export CUDA_VISIBLE_DEVICES=0,1,2,3
export model=pp_liteseg_stdc1_cityscapes_1024x512_scale0.5_160k # test resolution is 1024*512
# export model=pp_liteseg_stdc1_cityscapes_1024x512_scale0.75_160k # test resolution is 1536x768
# export model=pp_liteseg_stdc1_cityscapes_1024x512_scale1.0_160k # test resolution is 2048x1024
# export model=pp_liteseg_stdc2_cityscapes_1024x512_scale0.5_160k
# export model=pp_liteseg_stdc2_cityscapes_1024x512_scale0.75_160k
# export model=pp_liteseg_stdc2_cityscapes_1024x512_scale1.0_160k
# export model=pp_liteseg_stdc1_camvid_960x720_10k
# export model=pp_liteseg_stdc2_camvid_960x720_10k
python -m paddle.distributed.launch tools/train.py \
--config configs/pp_liteseg/${model}.yml \
--save_dir output/${model} \
--save_interval 1000 \
--num_workers 3 \
--do_eval \
--use_vdl
After the training, the weights are saved in PaddleSeg/output/xxx/best_model/model.pdparams
.
Refer to doc for the detailed usage of training.
With the config file and trained weights, we use the val.py
script to evaluate the model.
Refer to doc for the detailed usage of evalution.
export CUDA_VISIBLE_DEVICES=0
export model=pp_liteseg_stdc1_cityscapes_1024x512_scale0.5_160k
# export other model
python tools/val.py \
--config configs/pp_liteseg/${model}.yml \
--model_path output/${model}/best_model/model.pdparams \
--num_workers 3
Using ONNX+TRT
Prepare:
pip install TensorRT-7.1.3.4/python/xx.whl
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:TensorRT-7.1.3.4/lib
pip install 'pycuda>=2019.1.1'
pip install paddle2onnx onnx onnxruntime
We measure the inference speed with infer_onnx_trt.py, which first exports the Paddle model as ONNX and then infers the ONNX model by TRT.
Sometimes, the adaptive average pooling op can not be converted to ONNX. To solve the problem, you can adjust the input shape of the model as a multiple of 128.
python deploy/python/infer_onnx_trt.py \
--config configs/pp_liteseg/pp_liteseg_xxx.yml
--width 1024 \
--height 512
Please refer to infer_onnx_trt.py for the detailed usage.
Using PaddleInference
Export the trained model as inference model (doc).
Use PaddleInference to deploy the inference model on Nvidia GPU and X86 CPU(python api doc, cpp api doc).
Model | Backbone | Training Iters | Train Resolution | Test Resolution | mIoU | mIoU (flip) | mIoU (ms+flip) | Links |
---|---|---|---|---|---|---|---|---|
PP-LiteSeg-T | STDC1 | 160000 | 1024x512 | 1025x512 | 73.10% | 73.89% | - | config|model|log|vdl |
PP-LiteSeg-T | STDC1 | 160000 | 1024x512 | 1536x768 | 76.03% | 76.74% | - | config|model|log|vdl |
PP-LiteSeg-T | STDC1 | 160000 | 1024x512 | 2048x1024 | 77.04% | 77.73% | 77.46% | config|model|log|vdl |
PP-LiteSeg-B | STDC2 | 160000 | 1024x512 | 1024x512 | 75.25% | 75.65% | - | config|model|log|vdl |
PP-LiteSeg-B | STDC2 | 160000 | 1024x512 | 1536x768 | 78.75% | 79.23% | - | config|model|log|vdl |
PP-LiteSeg-B | STDC2 | 160000 | 1024x512 | 2048x1024 | 79.04% | 79.52% | 79.85% | config|model|log|vdl |
Note that:
The comparisons with state-of-the-art real-time methods on Cityscapes as follows.
Model | Encoder | Resolution | mIoU(Val) | mIoU(Test) | FPS |
---|---|---|---|---|---|
ENet | - | 512x1024 | - | 58.3 | 76.9 |
ICNet | PSPNet50 | 1024x2048 | - | 69.5 | 30.3 |
ESPNet | ESPNet | 512x1024 | - | 60.3 | 112.9 |
ESPNetV2 | ESPNetV2 | 512x1024 | 66.4 | 66.2 | - |
SwiftNet | ResNet18 | 1024x2048 | 75.4 | 75.5 | 39.9 |
BiSeNetV1 | Xception39 | 768x1536 | 69.0 | 68.4 | 105.8 |
BiSeNetV1-L | ResNet18 | 768x1536 | 74.8 | 74.7 | 65.5 |
BiSeNetV2 | - | 512x1024 | 73.4 | 72.6 | 156 |
BiSeNetV2-L | - | 512x1024 | 75.8 | 75.3 | 47.3 |
FasterSeg | - | 1024x2048 | 73.1 | 71.5 | 163.9 |
SFNet | DF1 | 1024x2048 | - | 74.5 | 121 |
STDC1-Seg50 | STDC1 | 512x1024 | 72.2 | 71.9 | 250.4 |
STDC2-Seg50 | STDC2 | 512x1024 | 74.2 | 73.4 | 188.6 |
STDC1-Seg75 | STDC1 | 768x1536 | 74.5 | 75.3 | 126.7 |
STDC2-Seg75 | STDC2 | 768x1536 | 77.0 | 76.8 | 97.0 |
PP-LiteSeg-T1 | STDC1 | 512x1024 | 73.1 | 72.0 | 273.6 |
PP-LiteSeg-B1 | STDC2 | 512x1024 | 75.3 | 73.9 | 195.3 |
PP-LiteSeg-T2 | STDC1 | 768x1536 | 76.0 | 74.9 | 143.6 |
PP-LiteSeg-B2 | STDC2 | 768x1536 | 78.2 | 77.5 | 102.6 |
Model | Backbone | Training Iters | Train Resolution | Test Resolution | mIoU | mIoU (flip) | mIoU (ms+flip) | Links |
---|---|---|---|---|---|---|---|---|
PP-LiteSeg-T | STDC1 | 10000 | 960x720 | 960x720 | 73.30% | 73.89% | 73.66% | config|model|log|vdl |
PP-LiteSeg-B | STDC2 | 10000 | 960x720 | 960x720 | 75.10% | 75.85% | 75.48% | config|model|log|vdl |
Note:
飞桨高性能图像分割开发套件,端到端完成从训练到部署的全流程图像分割应用。
https://github.com/PaddlePaddle/PaddleSeg
Python Markdown Text Shell Java other
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》