History

shangliang Xu 339eed8b64 [docs] fix deadline in ppyoloe docs, test=document_fix (#7456 )		1 year ago
..
_base_	[PPYOLOE] fix proj_conv in ptq bug (#6908)	1 year ago

application	[cherry-pick] [ppyoloe-plus] add task reademe and base config, test=document_fix (#6752)	1 year ago

README.md	[docs] fix deadline in ppyoloe docs, test=document_fix (#7456)	1 year ago

README_cn.md	[docs] fix deadline in ppyoloe docs, test=document_fix (#7456)	1 year ago

README_legacy.md	[ppyoloe-plus] update ppyoloe legacy configs (#6718)	1 year ago

ppyoloe_crn_l_36e_coco_xpu.yml	[PPYOLOE] fix proj_conv in ptq bug (#6908)	1 year ago

ppyoloe_crn_l_300e_coco.yml	[ppyoloe-plus] update ppyoloe legacy configs (#6718)	1 year ago

ppyoloe_crn_m_300e_coco.yml	[ppyoloe-plus] update ppyoloe legacy configs (#6718)	1 year ago

ppyoloe_crn_s_300e_coco.yml	[ppyoloe-plus] update ppyoloe legacy configs (#6718)	1 year ago

ppyoloe_crn_s_400e_coco.yml	[ppyoloe-plus] update ppyoloe legacy configs (#6718)	1 year ago

ppyoloe_crn_x_300e_coco.yml	[ppyoloe-plus] update ppyoloe legacy configs (#6718)	1 year ago

ppyoloe_plus_crn_l_80e_coco.yml	[dev] add ppyoloe_plus configs and alter NormalizeImage (#6675)	1 year ago

ppyoloe_plus_crn_m_80e_coco.yml	[dev] add ppyoloe_plus configs and alter NormalizeImage (#6675)	1 year ago

ppyoloe_plus_crn_s_80e_coco.yml	[dev] add ppyoloe_plus configs and alter NormalizeImage (#6675)	1 year ago

ppyoloe_plus_crn_x_80e_coco.yml	[dev] add ppyoloe_plus configs and alter NormalizeImage (#6675)	1 year ago

README.md

PP-YOLOE

English | 简体中文

PP-YOLOE

Latest News

Release PP-YOLOE+ model: (2022.08)
- Pre training model using large-scale data set obj365
- In the backbone, add the alpha parameter to the block branch
- Optimize the end-to-end inference speed and improve the training convergence speed

Legacy model

Please refer to：PP-YOLOE 2022.03 for details

Introduction

PP-YOLOE is an excellent single-stage anchor-free model based on PP-YOLOv2, surpassing a variety of popular YOLO models. PP-YOLOE has a series of models, named s/m/l/x, which are configured through width multiplier and depth multiplier. PP-YOLOE avoids using special operators, such as Deformable Convolution or Matrix NMS, to be deployed friendly on various hardware. For more details, please refer to our report.

PP-YOLOE+_l achieves 53.3 mAP on COCO test-dev2017 dataset with 78.1 FPS on Tesla V100. While using TensorRT FP16, PP-YOLOE+_l can be further accelerated to 149.2 FPS. PP-YOLOE+_s/m/x also have excellent accuracy and speed performance, which can be found in Model Zoo

PP-YOLOE is composed of following methods:

Scalable backbone and neck
Task Alignment Learning
Efficient Task-aligned head with DFL and VFL
SiLU(Swish) activation function

Model Zoo

Model	Epoch	GPU number	images/GPU	backbone	input shape	Box AP^val 0.5:0.95	Box AP^{test 0.5:0.95}	Params(M)	FLOPs(G)	V100 FP32(FPS)	V100 TensorRT FP16(FPS)	download	config
PP-YOLOE+_s	80	8	8	cspresnet-s	640	43.7	43.9	7.93	17.36	208.3	333.3	model	config
PP-YOLOE+_m	80	8	8	cspresnet-m	640	49.8	50.0	23.43	49.91	123.4	208.3	model	config
PP-YOLOE+_l	80	8	8	cspresnet-l	640	52.9	53.3	52.20	110.07	78.1	149.2	model	config
PP-YOLOE+_x	80	8	8	cspresnet-x	640	54.7	54.9	98.42	206.59	45.0	95.2	model	config

Comprehensive Metrics

Model	Epoch	AP^0.5:0.95	AP^0.5	AP^0.75	AP^small	AP^medium	AP^large	AR^small	AR^medium	AR^large
PP-YOLOE+_s	80	43.7	60.6	47.9	26.5	47.5	59.0	46.7	71.4	81.7
PP-YOLOE+_m	80	49.8	67.1	54.5	31.8	53.9	66.2	53.3	75.0	84.6
PP-YOLOE+_l	80	52.9	70.1	57.9	35.2	57.5	69.1	56.0	77.9	86.9
PP-YOLOE+_x	80	54.7	72.0	59.9	37.9	59.3	70.4	57.0	78.7	87.2

End-to-end Speed

Model	AP^0.5:0.95	TRT-FP32(fps)	TRT-FP16(fps)
PP-YOLOE+_s	43.7	44.44	47.85
PP-YOLOE+_m	49.8	39.06	43.86
PP-YOLOE+_l	52.9	34.01	42.02
PP-YOLOE+_x	54.7	26.88	36.76

Notes:

PP-YOLOE is trained on COCO train2017 dataset and evaluated on val2017 & test-dev2017 dataset.
The model weights in the table of Comprehensive Metrics are the same as that in the original Model Zoo, and evaluated on val2017.
PP-YOLOE used 8 GPUs for mixed precision training, if GPU number or mini-batch size is changed, learning rate should be adjusted according to the formula lr_new = lr_default * (batch_size_new * GPU_number_new) / (batch_size_default * GPU_number_default).
PP-YOLOE inference speed is tesed on single Tesla V100 with batch size as 1, CUDA 10.2, CUDNN 7.6.5, TensorRT 6.0.1.8 in TensorRT mode.
Refer to Speed testing to reproduce the speed testing results of PP-YOLOE.
If you set --run_benchmark=True，you should install these dependencies at first, pip install pynvml psutil GPUtil.
End-to-end speed test includes pre-processing + inference + post-processing and NMS time, using Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, single Tesla V100, CUDA 11.2, CUDNN 8.2.0, TensorRT 8.0.1.6.

Feature Models

The PaddleDetection team provides configs and weights of various feature detection models based on PP-YOLOE, which users can download for use:

Scenarios	Related Datasets	Links
Pedestrian Detection	CrowdHuman	pphuman
Vehicle Detection	BDD100K, UA-DETRAC	ppvehicle
Small Object Detection	VisDrone	visdrone

Getting Start

Training

Training PP-YOLOE+ on 8 GPUs with following command

python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml --eval --amp

Notes:

If you need to evaluate while training, please add --eval.
PP-YOLOE+ supports mixed precision training, please add --amp.
PaddleDetection supports multi-machine distribued training, you can refer to DistributedTraining tutorial.

Evaluation

Evaluating PP-YOLOE+ on COCO val2017 dataset in single GPU with following commands:

CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams

For evaluation on COCO test-dev2017 dataset, please download COCO test-dev2017 dataset from COCO dataset download and decompress to COCO dataset directory and configure EvalDataset like configs/ppyolo/ppyolo_test.yml.

Inference

Inference images in single GPU with following commands, use --infer_img to inference a single image and --infer_dir to inference all images in the directory.

# inference single image
CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams --infer_img=demo/000000014439_640x640.jpg

# inference all images in the directory
CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams --infer_dir=demo

Exporting models

For deployment on GPU or speed testing, model should be first exported to inference model using tools/export_model.py.

Exporting PP-YOLOE+ for Paddle Inference without TensorRT, use following command

python tools/export_model.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams

Exporting PP-YOLOE+ for Paddle Inference with TensorRT for better performance, use following command with extra -o trt=True setting.

python tools/export_model.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams trt=True

If you want to export PP-YOLOE model to ONNX format, use following command refer to PaddleDetection Model Export as ONNX Format Tutorial.

# export inference model
python tools/export_model.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml --output_dir=output_inference -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams

# install paddle2onnx
pip install paddle2onnx

# convert to onnx
paddle2onnx --model_dir output_inference/ppyoloe_plus_crn_l_80e_coco --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 11 --save_file ppyoloe_plus_crn_l_80e_coco.onnx

Notes: ONNX model only supports batch_size=1 now

Speed testing

For fair comparison, the speed in Model Zoo do not contains the time cost of data reading and post-processing(NMS), which is same as YOLOv4(AlexyAB) in testing method. Thus, you should export model with extra -o exclude_nms=True setting.

Using Paddle Inference without TensorRT to test speed, run following command

# export inference model
python tools/export_model.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams exclude_nms=True

# speed testing with run_benchmark=True
CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_plus_crn_l_80e_coco --image_file=demo/000000014439_640x640.jpg --run_mode=paddle --device=gpu --run_benchmark=True

Using Paddle Inference with TensorRT to test speed, run following command

# export inference model with trt=True
python tools/export_model.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams exclude_nms=True trt=True

# speed testing with run_benchmark=True,run_mode=trt_fp32/trt_fp16
CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_plus_crn_l_80e_coco --image_file=demo/000000014439_640x640.jpg --run_mode=trt_fp16 --device=gpu --run_benchmark=True

Using TensorRT Inference with ONNX to test speed, run following command

# export inference model with trt=True
python tools/export_model.py -c configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_s_80e_coco.pdparams exclude_nms=True trt=True

# convert to onnx
paddle2onnx --model_dir output_inference/ppyoloe_plus_crn_s_80e_coco --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 12 --save_file ppyoloe_plus_crn_s_80e_coco.onnx

# trt inference using fp16 and batch_size=1
trtexec --onnx=./ppyoloe_plus_crn_s_80e_coco.onnx --saveEngine=./ppyoloe_s_bs1.engine --workspace=1024 --avgRuns=1000 --shapes=image:1x3x640x640,scale_factor:1x2 --fp16

# trt inference using fp16 and batch_size=32
trtexec --onnx=./ppyoloe_plus_crn_s_80e_coco.onnx --saveEngine=./ppyoloe_s_bs32.engine --workspace=1024 --avgRuns=1000 --shapes=image:32x3x640x640,scale_factor:32x2 --fp16

# Using the above script, T4 and tensorrt 7.2 machine, the speed of PPYOLOE-s model is as follows,

# batch_size=1, 2.80ms, 357fps
# batch_size=32, 67.69ms, 472fps

Deployment

PP-YOLOE can be deployed by following approches:

Next, we will introduce how to use Paddle Inference to deploy PP-YOLOE models in TensorRT FP16 mode.

First, refer to Paddle Inference Docs, download and install packages corresponding to CUDA, CUDNN and TensorRT version.

Then, Exporting PP-YOLOE for Paddle Inference with TensorRT, use following command.

python tools/export_model.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams trt=True

Finally, inference in TensorRT FP16 mode.

# inference single image
CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_plus_crn_l_80e_coco --image_file=demo/000000014439_640x640.jpg --device=gpu --run_mode=trt_fp16

# inference all images in the directory
CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_plus_crn_l_80e_coco --image_dir=demo/ --device=gpu  --run_mode=trt_fp16

Notes:

TensorRT will perform optimization for the current hardware platform according to the definition of the network, generate an inference engine and serialize it into a file. This inference engine is only applicable to the current hardware hardware platform. If your hardware and software platform has not changed, you can set use_static=True in enable_tensorrt_engine. In this way, the serialized file generated will be saved in the output_inference folder, and the saved serialized file will be loaded the next time when TensorRT is executed.
PaddleDetection release/2.4 and later versions will support NMS calling TensorRT, which requires PaddlePaddle release/2.3 and later versions.

Other Datasets

Model	AP	AP₅₀
YOLOX	22.6	37.5
YOLOv5	26.0	42.7
PP-YOLOE	30.5	46.4

Notes

Here, we use VisDrone dataset, and to detect 9 objects including person, bicycles, car, van, truck, tricyle, awning-tricyle, bus, motor.
Above models trained using official default config, and load pretrained parameters on COCO dataset.
Due to the limited time, more verification results will be supplemented in the future. You are also welcome to contribute to PP-YOLOE

Appendix

Ablation experiments of PP-YOLOE.

NO.	Model	Box AP^val	Params(M)	FLOPs(G)	V100 FP32 FPS
A	PP-YOLOv2	49.1	54.58	115.77	68.9
B	A + Anchor-free	48.8	54.27	114.78	69.8
C	B + CSPRepResNet	49.5	47.42	101.87	85.5
D	C + TAL	50.4	48.32	104.75	84.0
E	D + ET-Head	50.9	52.20	110.07	78.1

PaddleDetection是一个基于PaddlePaddle的目标检测端到端开发套件，在提供丰富的模型组件和测试基准的同时，注重端到端的产业落地应用，通过打造产业级特色模型|工具、建设产业应用范例等手段，帮助开发者实现数据准备、模型选型、模型训练、模型部署的全流程打通，快速进行落地应用。

Python Markdown C++ Text Shell other

jerrywgz@126.com 742925032@qq.com dengkaipeng@baidu.com nemonameless@qq.com ghostxsl@users.noreply.github.com 69842442+wangxinxin08@users.noreply.github.com wenyu.lyu@gmail.com liuhui29@baidu.com zoooo0820@qq.com 31800336+zhiboniu@users.noreply.github.com dangqingqing@baidu.com yangzhang@live.com zhiboniu@163.com 48054808+YixinKristy@users.noreply.github.com dazhiningsibuqu@163.com 2120160898@bit.edu.cn 1290573099@qq.com wanghaoshuang@baidu.com me@ethanbai.com 245467267@qq.com slf12thuss@163.com 576550767@qq.com yuan.gao.gavin@gmail.com 82303451+pkhk-1@users.noreply.github.com 48579383+SunAhong1993@users.noreply.github.com

How to access data resources in code

README.md

PP-YOLOE

Latest News

Legacy model

Table of Contents

Introduction

Model Zoo

Comprehensive Metrics

End-to-end Speed

Feature Models

Getting Start

Training

Evaluation

Inference

Exporting models

Speed testing

Deployment

Other Datasets

Appendix

Contributors (25+)
All

README.md

PP-YOLOE

Latest News

Legacy model

Table of Contents

Introduction

Model Zoo

Comprehensive Metrics

End-to-end Speed

Feature Models

Getting Start

Training

Evaluation

Inference

Exporting models

Speed testing

Deployment

Other Datasets

Appendix

Contributors (25+) All

Contributors (25+)
All