History

Darren 37ccd3683d unify gen_wts.py and inference.cpp dummy test value. (#457 ) Reviewed by: @L1aoXingyu		3 years ago
..
demo	[v005] set INT8 calibrate set via cmake (#459)	3 years ago

docker	update dockerfile in fastrt (#437)	3 years ago

fastrt	[v005] set INT8 calibrate set via cmake (#459)	3 years ago

include/fastrt	[v005] set INT8 calibrate set via cmake (#459)	3 years ago

pybind_interface	[v005] set INT8 calibrate set via cmake (#459)	3 years ago

third_party/cnpy	fastrt patch update	3 years ago

tools	unify gen_wts.py and inference.cpp dummy test value. (#457)	3 years ago

.gitignore	Add python interface by pybind11 and Int8 mode	3 years ago

CMakeLists.txt	[v005] set INT8 calibrate set via cmake (#459)	3 years ago

README.md	[v005] set INT8 calibrate set via cmake (#459)	3 years ago

README.md

C++ FastReID-TensorRT

C++ FastReID-TensorRT

Implementation of reid model with TensorRT network definition APIs to build the whole network.

So we don't use any parsers here.

How to Run

Generate '.wts' file from pytorch with model_best.pth

See How_to_Generate.md
Config your model

See Tensorrt Model Config
(Optional) Build third party libs

See Build third_party section

Build fastrt execute file

mkdir build
cd build
cmake -DBUILD_FASTRT_ENGINE=ON \
      -DBUILD_DEMO=ON \
      -DUSE_CNUMPY=ON ..
make

Run fastrt

put model_best.wts into FastRT/

./demo/fastrt -s  // serialize model & save as 'xxx.engine' file

./demo/fastrt -d  // deserialize 'xxx.engine' file and run inference

Verify the output with pytorch

(Optional) Once you verify the result, you can set FP16 for speed up

mkdir build
cd build
cmake -DBUILD_FASTRT_ENGINE=ON \
      -DBUILD_DEMO=ON \
      -DBUILD_FP16=ON ..
make

then go to step 5

(Optional) You can use INT8 quantization for speed up

prepare CALIBRATE DATASET and set the path via cmake. (The path must end with /)

mkdir build
cd build
cmake -DBUILD_FASTRT_ENGINE=ON \
      -DBUILD_DEMO=ON \
      -DBUILD_INT8=ON \
      -DINT8_CALIBRATE_DATASET_PATH="/data/Market-1501-v15.09.15/bounding_box_test/" ..
make

then go to step 5

(Optional) Build tensorrt model as shared libs

mkdir build
cd build
cmake -DBUILD_FASTRT_ENGINE=ON \
      -DBUILD_DEMO=OFF \
      -DBUILD_FP16=ON ..
make
make install

You should find libs in FastRT/libs/FastRTEngine/

Now build your application execute file

cmake -DBUILD_FASTRT_ENGINE=OFF -DBUILD_DEMO=ON ..
make

then go to step 5

(Optional) Build tensorrt model with python interface, then you can use FastRT model in python.
```
mkdir build
cd build
cmake -DBUILD_FASTRT_ENGINE=ON \
    -DBUILD_DEMO=ON \
    -DBUILD_PYTHON_INTERFACE=ON ..
make
```
You should get a so file FastRT/build/pybind_interface/ReID.cpython-37m-x86_64-linux-gnu.so.

Then go to step 5 to create engine file.

After that you can import this so file in python, and deserialize engine file to infer in python.

You can find use example in pybind_interface/test.py and pybind_interface/market_benchmark.py.
```
from PATH_TO_SO_FILE import ReID
model = ReID(GPU_ID)
model.build(PATH_TO_YOUR_ENGINEFILE)
numpy_feature = np.array([model.infer(CV2_FRAME)])
```
- pybind_interface/test.py use pybind_interface/docker/trt7cu100/Dockerfile (without pytorch installed)
- pybind_interface/market_benchmark.py use pybind_interface/docker/trt7cu102_torch160/Dockerfile (with pytorch installed)

`Tensorrt Model Config`

Edit FastRT/demo/inference.cpp, according to your model config

The config is related to How_to_Generate.md

Ex1. sbs_R50-ibn

static const std::string WEIGHTS_PATH = "../sbs_R50-ibn.wts"; 
static const std::string ENGINE_PATH = "./sbs_R50-ibn.engine";

static const int MAX_BATCH_SIZE = 4;
static const int INPUT_H = 384;
static const int INPUT_W = 128;
static const int OUTPUT_SIZE = 2048;
static const int DEVICE_ID = 0;

static const FastreidBackboneType BACKBONE = FastreidBackboneType::r50; 
static const FastreidHeadType HEAD = FastreidHeadType::EmbeddingHead;
static const FastreidPoolingType HEAD_POOLING = FastreidPoolingType::gempoolP;
static const int LAST_STRIDE = 1;
static const bool WITH_IBNA = true; 
static const bool WITH_NL = true;
static const int EMBEDDING_DIM = 0;

Ex2. sbs_R50

static const std::string WEIGHTS_PATH = "../sbs_R50.wts";
static const std::string ENGINE_PATH = "./sbs_R50.engine"; 

static const int MAX_BATCH_SIZE = 4;
static const int INPUT_H = 384;
static const int INPUT_W = 128;
static const int OUTPUT_SIZE = 2048;
static const int DEVICE_ID = 0;

static const FastreidBackboneType BACKBONE = FastreidBackboneType::r50; 
static const FastreidHeadType HEAD = FastreidHeadType::EmbeddingHead;
static const FastreidPoolingType HEAD_POOLING = FastreidPoolingType::gempoolP;
static const int LAST_STRIDE = 1;
static const bool WITH_IBNA = false; 
static const bool WITH_NL = true;
static const int EMBEDDING_DIM = 0;

Ex3. sbs_r34_distill

static const std::string WEIGHTS_PATH = "../sbs_r34_distill.wts"; 
static const std::string ENGINE_PATH = "./sbs_r34_distill.engine";

static const int MAX_BATCH_SIZE = 4;
static const int INPUT_H = 384;
static const int INPUT_W = 128;
static const int OUTPUT_SIZE = 512;
static const int DEVICE_ID = 0;

static const FastreidBackboneType BACKBONE = FastreidBackboneType::r34_distill; 
static const FastreidHeadType HEAD = FastreidHeadType::EmbeddingHead;
static const FastreidPoolingType HEAD_POOLING = FastreidPoolingType::gempoolP;
static const int LAST_STRIDE = 1;
static const bool WITH_IBNA = false; 
static const bool WITH_NL = false;
static const int EMBEDDING_DIM = 0;

Ex4.kd-r34-r101_ibn

static const std::string WEIGHTS_PATH = "../kd_r34_distill.wts"; 
static const std::string ENGINE_PATH = "./kd_r34_distill.engine"; 

static const int MAX_BATCH_SIZE = 4;
static const int INPUT_H = 384;
static const int INPUT_W = 128;
static const int OUTPUT_SIZE = 512;
static const int DEVICE_ID = 0;

static const FastreidBackboneType BACKBONE = FastreidBackboneType::r34_distill; 
static const FastreidHeadType HEAD = FastreidHeadType::EmbeddingHead;
static const FastreidPoolingType HEAD_POOLING = FastreidPoolingType::gempoolP;
static const int LAST_STRIDE = 1;
static const bool WITH_IBNA = false; 
static const bool WITH_NL = false;
static const int EMBEDDING_DIM = 0;

Ex5.kd-r18-r101_ibn

static const std::string WEIGHTS_PATH = "../kd-r18-r101_ibn.wts"; 
static const std::string ENGINE_PATH = "./kd_r18_distill.engine"; 

static const int MAX_BATCH_SIZE = 16;
static const int INPUT_H = 384;
static const int INPUT_W = 128;
static const int OUTPUT_SIZE = 512;
static const int DEVICE_ID = 1;

static const FastreidBackboneType BACKBONE = FastreidBackboneType::r18_distill; 
static const FastreidHeadType HEAD = FastreidHeadType::EmbeddingHead;
static const FastreidPoolingType HEAD_POOLING = FastreidPoolingType::gempoolP;
static const int LAST_STRIDE = 1;
static const bool WITH_IBNA = true; 
static const bool WITH_NL = false;
static const int EMBEDDING_DIM = 0;

Supported conversion

Backbone: resnet50, resnet34, distill-resnet50, distill-resnet34, distill-resnet18
Heads: embedding_head
Plugin layers: ibn, non-local
Pooling layers: maxpool, avgpool, GeneralizedMeanPooling, GeneralizedMeanPoolingP

Benchmark

Model	Engine	Batch size	Image size	Embd	Time
Vanilla R34	Python/Pytorch1.6 fp32	1	256x128	512	6.49ms
Vanilla R34	Python/Pytorch1.6 fp32	4	256x128	512	7.16ms
Vanilla R34	C++/trt7 fp32	1	256x128	512	2.34ms
Vanilla R34	C++/trt7 fp32	4	256x128	512	3.99ms
Vanilla R34	C++/trt7 fp16	1	256x128	512	1.83ms
Vanilla R34	C++/trt7 fp16	4	256x128	512	2.38ms
Distill R34	Python/Pytorch1.6 fp32	1	256x128	512	5.68ms
Distill R34	Python/Pytorch1.6 fp32	4	256x128	512	6.26ms
Distill R34	C++/trt7 fp32	1	256x128	512	2.36ms
Distill R34	C++/trt7 fp32	4	256x128	512	4.05ms
Distill R34	C++/trt7 fp16	1	256x128	512	1.86ms
Distill R34	C++/trt7 fp16	4	256x128	512	2.68ms
R50-NL-IBN	Python/Pytorch1.6 fp32	1	256x128	2048	14.86ms
R50-NL-IBN	Python/Pytorch1.6 fp32	4	256x128	2048	15.14ms
R50-NL-IBN	C++/trt7 fp32	1	256x128	2048	4.67ms
R50-NL-IBN	C++/trt7 fp32	4	256x128	2048	6.15ms
R50-NL-IBN	C++/trt7 fp16	1	256x128	2048	2.87ms
R50-NL-IBN	C++/trt7 fp16	4	256x128	2048	3.81ms

Time: preprocessing(normalization) + inference (100 times average)
GPU: GTX 2080 TI

Test Environment

fastreid v1.0.0 / 2080TI / Ubuntu18.04 / Nvidia driver 435 / cuda10.0 / cudnn7.6.5 / trt7.0.0 / nvinfer7.0.0 / opencv3.2
fastreid v1.0.0 / 2080TI / Ubuntu18.04 / Nvidia driver 450 / cuda10.2 / cudnn7.6.5 / trt7.0.0 / nvinfer7.0.0 / opencv3.2

Installation

Set up with Docker

for cuda10.0

cd docker/trt7cu100
sudo docker build -t trt7:cuda100 .
sudo docker run --gpus all -it --name fastrt -v /home/YOURID/workspace:/workspace -d trt7:cuda100
// then put the repo into `/home/YOURID/workspace/` before you getin container

for cuda10.2

cd docker/trt7cu102
sudo docker build -t trt7:cuda102 .
sudo docker run --gpus all -it --name fastrt -v /home/YOURID/workspace:/workspace -d trt7:cuda102 
// then put the repo into `/home/YOURID/workspace/` before you getin container

Installation reference

Build third party

for read/write numpy

cd third_party/cnpy
cmake -DCMAKE_INSTALL_PREFIX=../../libs/cnpy -DENABLE_STATIC=OFF . && make -j4 && make install

该算法把样本聚类和特征学习融合到一个端到端的网络框架中，提升模型的跨域能力。该模型在Market训练，DukeMTMC上测试能达到82.0%的准确率，在DukeMTMC上训练，Market上测试能达到92.2%的准确率。

Text Python C++ Protocol Buffer Markdown other

guan.wang0706@gmail.com 39995121+JinkaiZheng@users.noreply.github.com dongji0105@hotmail.com xyliao1993@qq.com

cclauss@me.com Dongji0105@hotmail.com 1315673509@qq.com liaoxingyu5@jd.com

feiy777@hotmail.com kaiwen.yuan1992@gmail.com itsnamgyu@gmail.com 18394173714@163.com

Zhedong.Zheng@student.uts.edu.au streetyao@live.com 1868690+wk5ovc@users.noreply.github.com 41889015+xbq1994@users.noreply.github.com 88366865+xiaomingzhid@users.noreply.github.com 39995121+zjk15068083791@users.noreply.github.com tyc.allen@gmail.com

How to access data resources in code

README.md

C++ FastReID-TensorRT

How to Run

Tensorrt Model Config

Supported conversion

Benchmark

Test Environment

Installation

Build third party

Contributors (24) All

`Tensorrt Model Config`

Contributors (24)
All