katekong 1e8e701f02 | 1 year ago | |
---|---|---|
.. | ||
README.md | 1 year ago | |
README_CN.md | 1 year ago | |
db_r50_icdar15.yaml | 1 year ago |
English | 中文
Real-time Scene Text Detection with Differentiable Binarization
DBNet is a segmentation-based scene text detection method. Segmentation-based methods are gaining popularity for scene
text detection purposes as they can more accurately describe scene text of various shapes, such as curved text.
The drawback of current segmentation-based SOTA methods is the post-processing of binarization (conversion of
probability maps into text bounding boxes) which often requires a manually set threshold (reduces prediction accuracy)
and complex algorithms for grouping pixels (resulting in a considerable time cost during inference).
To eliminate the problem described above, DBNet integrates an adaptive threshold called Differentiable Binarization(DB)
into the architecture. DB simplifies post-processing and enhances the performance of text detection.Moreover, it can be
removed in the inference stage without sacrificing performance.[1]
Figure 1. Overall DBNet architecture
The overall architecture of DBNet is presented in Figure 1. It consists of multiple stages:
Model | Backbone | Pretrained | Recall | Precision | F-score | Recipe | Download |
---|---|---|---|---|---|---|---|
DBNet | ResNet-50 | ImageNet | 81.70% | 85.84% | 83.72% | yaml | weights |
Please refer to the installation instruction in MindOCR.
First, the dataset labels need to be converted. To do so,
download ICDAR2015, and extract images and labels in a preferred folder.
Then, use the following command to generate training labels:
python tools/dataset_converters/convert.py --dataset_name=ic15 --task=det --image_dir=IMAGES_DIR --label_dir=LABELS_DIR --output_path=OUTPUT_PATH
Repeat this step to generate the test labels.
After the label files are generated, update configs/det/db_r50_icdar15.yaml
configuration file with data paths,
specifically the following parts:
...
train:
ckpt_save_dir: './tmp_det'
dataset_sink_mode: True
dataset:
type: DetDataset
dataset_root: /data/ocr_datasets <------ HERE
data_dir: ic15/text_localization/train <------ HERE
label_file: ic15/text_localization/train/train_icdar15_label.txt <------ HERE
...
eval:
dataset_sink_mode: False
dataset:
type: DetDataset
dataset_root: /data/ocr_datasets <------ HERE
data_dir: ic15/text_localization/test <------ HERE
label_file: ic15/text_localization/test/test_icdar2015_label.txt <------ HERE
...
DBNet consists of 3 parts: backbone
, neck
, and head
. Specifically:
model:
type: det
transform: null
backbone:
name: det_resnet50 # Only ResNet50 is supported at the moment
pretrained: True # Whether to use weights pretrained on ImageNet
neck:
name: DBFPN # FPN part of the DBNet
out_channels: 256
bias: False
use_asf: False # Adaptive Scale Fusion module from DBNet++ (use it for DBNet++ only)
head:
name: DBHead
k: 50 # amplifying factor for Differentiable Binarization
bias: False
adaptive: True # True for training, False for inference
After preparing a dataset and setting the configuration, training can be started as follows:
python tools/train.py -c=configs/det/db_r50_icdar15.yaml
To evaluate the accuracy of the trained model, you can use eval.py
. Please add an additional configuration
parameter ckpt_load_path in eval
section and set it to the path of the model checkpoint and then run:
python tools/eval.py -c=configs/det/db_r50_icdar15.yaml
[1] Minghui Liao, Zhaoyi Wan, Cong Yao, Kai Chen, Xiang Bai. Real-time Scene Text Detection with Differentiable Binarization. arXiv:1911.08947, 2019
This is forked from https://github.com/mindspore-lab/mindocr
Jupyter Notebook Python Text Markdown Shell
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》