Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
liangxhao 1249bc0aea | 8 months ago | |
---|---|---|
.. | ||
README.md | 8 months ago | |
README_CN.md | 8 months ago | |
pse_mv3_icdar15.yaml | 9 months ago | |
pse_r50_icdar15.yaml | 10 months ago | |
pse_r152_ctw1500.yaml | 11 months ago | |
pse_r152_icdar15.yaml | 11 months ago |
English | 中文
PSENet: Shape Robust Text Detection With Progressive Scale Expansion Network
PSENet is a text detection algorithm based on semantic segmentation. It can precisely locate text instances with arbitrary shapes, while most anchor-based algorithms cannot be used to detect text instances with arbitrary shapes. Also, two texts that are close to each other may cause the model to make wrong predictions. Therefore, in order to solve the above problems, PSENet also proposes a Progressive Scale Expansion (PSE) algorithm, which can successfully identify adjacent text instances[1]。
Figure 1. Overall PSENet architecture
The overall architecture of PSENet is presented in Figure 1. It consists of multiple stages:
Model | Context | Backbone | Pretrained | Recall | Precision | F-score | Train T. | ms/step | Throughput | Recipe | Download |
---|---|---|---|---|---|---|---|---|---|---|---|
PSENet | D910x8-MS2.0-G | ResNet-152 | ImageNet | 79.39% | 84.91% | 82.06% | 11.544 s/epoch | 769.6 | 83.16 img/s | yaml | ckpt | mindir |
PSENet | D910x8-MS2.0-G | ResNet-50 | ImageNet | 76.75% | 86.58% | 81.37% | 4.562 s/epoch | 304.138 | 210.43 img/s | yaml | ckpt | mindir |
PSENet | D910x8-MS2.0-G | MobileNetV3 | ImageNet | 73.52% | 67.84% | 70.56% | 2.604 s/epoch | 173.604 | 368.66 img/s | yaml | ckpt | mindir |
Model | Context | Backbone | Pretrained | Recall | Precision | F-score | Train T. | ms/step | Throughput | Recipe | Download |
---|---|---|---|---|---|---|---|---|---|---|---|
PSENet | D910x8-MS2.0-G | ResNet-152 | ImageNet | 73.69% | 74.38% | 74.04% | 67 s/epoch | 4466.67 | 14.33 img/s | yaml | ckpt | mindir |
input_shapes
to the exported MindIR models trained on ICDAR2015 are (1,3,1472,2624)
for ResNet-152 backbone and (1,3,736,1312)
for ResNet-50 or MobileNetV3 backbone.(1,3,1024,1024)
.Please refer to the installation instruction in MindOCR.
Please download ICDAR2015 dataset, and convert the labels to the desired format referring to dataset_converters.
The prepared dataset file struture should be:
.
├── test
│ ├── images
│ │ ├── img_1.jpg
│ │ ├── img_2.jpg
│ │ └── ...
│ └── test_det_gt.txt
└── train
├── images
│ ├── img_1.jpg
│ ├── img_2.jpg
│ └── ....jpg
└── train_det_gt.txt
Please download SCUT-CTW1500 dataset and convert the labels to the desired format referring to dataset_converters.
The prepared dataset file struture should be:
ctw1500
├── test_images
│ ├── 1001.jpg
│ ├── 1002.jpg
│ ├── ...
├── train_images
│ ├── 0001.jpg
│ ├── 0002.jpg
│ ├── ...
├── test_det_gt.txt
├── train_det_gt.tx
Update configs/det/psenet/pse_r152_icdar15.yaml
configuration file with data paths,
specifically the following parts. The dataset_root
will be concatenated with data_dir
and label_file
respectively to be the complete dataset directory and label file path.
...
train:
ckpt_save_dir: './tmp_det'
dataset_sink_mode: False
dataset:
type: DetDataset
dataset_root: dir/to/dataset <--- Update
data_dir: train/images <--- Update
label_file: train/train_det_gt.txt <--- Update
...
eval:
dataset_sink_mode: False
dataset:
type: DetDataset
dataset_root: dir/to/dataset <--- Update
data_dir: test/images <--- Update
label_file: test/test_det_gt.txt <--- Update
...
Optionally, change
num_workers
according to the cores of CPU.
PSENet consists of 3 parts: backbone
, neck
, and head
. Specifically:
model:
type: det
transform: null
backbone:
name: det_resnet152
pretrained: True # Whether to use weights pretrained on ImageNet
neck:
name: PSEFPN # FPN part of the PSENet
out_channels: 128
head:
name: PSEHead
hidden_size: 256
out_channels: 7 # number of kernels
Before training, please make sure to compile the postprocessing codes in the /mindocr/postprocess/pse directory as follows:
python3 setup.py build_ext --inplace
Please set distribute
in yaml config file to be False.
# train psenet on ic15 dataset
python tools/train.py --config configs/det/psenet/pse_r152_icdar15.yaml
Please set distribute
in yaml config file to be True.
# n is the number of GPUs/NPUs
mpirun --allow-run-as-root -n 8 python tools/train.py --config configs/det/psenet/pse_r152_icdar15.yaml
The training result (including checkpoints, per-epoch performance and curves) will be saved in the directory parsed by the arg ckpt_save_dir
in yaml config file. The default directory is ./tmp_det
.
To evaluate the accuracy of the trained model, you can use eval.py
. Please set the checkpoint path to the arg ckpt_load_path
in the eval
section of yaml config file, set distribute
to be False, and then run:
python tools/eval.py --config configs/det/psenet/pse_r152_icdar15.yaml
Please refer to the tutorial MindOCR Inference for model inference based on MindSpot Lite on Ascend 310, including the following steps:
Please download the exported MindIR file first, or refer to the Model Export tutorial and use the following command to export the trained ckpt model to MindIR file:
python tools/export.py --model_name_or_config psenet_resnet152 --data_shape 1472 2624 --local_ckpt_path /path/to/local_ckpt.ckpt
# or
python tools/export.py --model_name_or_config configs/det/psenet/pse_r152_icdar15.yaml --data_shape 1472 2624 --local_ckpt_path /path/to/local_ckpt.ckpt
The data_shape
is the model input shape of height and width for MindIR file. The shape value of MindIR in the download link can be found in Notes.
Please refer to Environment Installation tutorial to configure the MindSpore Lite inference environment.
Please refer to Model Conversion,
and use the converter_lite
tool for offline conversion of the MindIR file.
Before inference, please ensure that the post-processing part of PSENet has been compiled (refer to the post-processing part of the Training chapter).
Assuming that you obtain output.mindir after model conversion, go to the deploy/py_infer
directory, and use the following command for inference:
python infer.py \
--input_images_dir=/your_path_to/test_images \
--det_model_path=your_path_to/output.mindir \
--det_model_name_or_config=../../configs/det/psenet/pse_r152_icdar15.yaml \
--res_save_dir=results_dir
[1] Wang, Wenhai, et al. "Shape robust text detection with progressive scale expansion network." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019.
No Description
Python Markdown Text C++ Shell other
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》