History

katekong 1e8e701f02 change requirements		1 year ago
..
README.md	sync the latest mindocr	1 year ago

README_CN.md	sync the latest mindocr	1 year ago

db_r50_icdar15.yaml	sync the latest mindocr	1 year ago

README.md

DBNet

English | 中文

DBNet

Real-time Scene Text Detection with Differentiable Binarization

Introduction

DBNet is a segmentation-based scene text detection method. Segmentation-based methods are gaining popularity for scene
text detection purposes as they can more accurately describe scene text of various shapes, such as curved text.
The drawback of current segmentation-based SOTA methods is the post-processing of binarization (conversion of
probability maps into text bounding boxes) which often requires a manually set threshold (reduces prediction accuracy)
and complex algorithms for grouping pixels (resulting in a considerable time cost during inference).
To eliminate the problem described above, DBNet integrates an adaptive threshold called Differentiable Binarization(DB)
into the architecture. DB simplifies post-processing and enhances the performance of text detection.Moreover, it can be
removed in the inference stage without sacrificing performance.[1]

Figure 1. Overall DBNet architecture

The overall architecture of DBNet is presented in Figure 1. It consists of multiple stages:

Feature extraction from a backbone at different scales. ResNet-50 is used as a backbone, and features are extracted
from stages 2, 3, 4, and 5.
The extracted features are upscaled and summed up with the previous stage features in a cascade fashion.
The resulting features are upscaled once again to match the size of the largest feature map (from the stage 2) and
concatenated along the channel axis.
Then, the final feature map (shown in dark blue) is used to predict both the probability and threshold maps by
applying 3×3 convolutional operator and two de-convolutional operators with stride 2.
The probability and threshold maps are merged into one approximate binary map by the Differentiable binarization
module. The approximate binary map is used to generate text bounding boxes.

Results

ICDAR2015

Model	Backbone	Pretrained	Recall	Precision	F-score	Recipe	Download
DBNet	ResNet-50	ImageNet	81.70%	85.84%	83.72%	yaml	weights

Quick Start

Preparation

Installation

Please refer to the installation instruction in MindOCR.

Dataset preparation

First, the dataset labels need to be converted. To do so,
download ICDAR2015, and extract images and labels in a preferred folder.
Then, use the following command to generate training labels:

python tools/dataset_converters/convert.py --dataset_name=ic15 --task=det --image_dir=IMAGES_DIR --label_dir=LABELS_DIR --output_path=OUTPUT_PATH

Repeat this step to generate the test labels.

After the label files are generated, update configs/det/db_r50_icdar15.yaml configuration file with data paths,
specifically the following parts:

...
train:
  ckpt_save_dir: './tmp_det'
  dataset_sink_mode: True
  dataset:
    type: DetDataset
    dataset_root: /data/ocr_datasets                                  <------ HERE
    data_dir: ic15/text_localization/train                            <------ HERE
    label_file: ic15/text_localization/train/train_icdar15_label.txt  <------ HERE
...
eval:
  dataset_sink_mode: False
  dataset:
    type: DetDataset
    dataset_root: /data/ocr_datasets                                  <------ HERE
    data_dir: ic15/text_localization/test                             <------ HERE
    label_file: ic15/text_localization/test/test_icdar2015_label.txt  <------ HERE
...

Config explanation

DBNet consists of 3 parts: backbone, neck, and head. Specifically:

model:
  type: det
  transform: null
  backbone:
    name: det_resnet50  # Only ResNet50 is supported at the moment
    pretrained: True    # Whether to use weights pretrained on ImageNet
  neck:
    name: DBFPN         # FPN part of the DBNet
    out_channels: 256
    bias: False
    use_asf: False      # Adaptive Scale Fusion module from DBNet++ (use it for DBNet++ only)
  head:
    name: DBHead
    k: 50               # amplifying factor for Differentiable Binarization
    bias: False
    adaptive: True      # True for training, False for inference

Training

After preparing a dataset and setting the configuration, training can be started as follows:

python tools/train.py -c=configs/det/db_r50_icdar15.yaml

Evaluation

To evaluate the accuracy of the trained model, you can use eval.py. Please add an additional configuration
parameter ckpt_load_path in eval section and set it to the path of the model checkpoint and then run:

python tools/eval.py -c=configs/det/db_r50_icdar15.yaml

References

[1] Minghui Liao, Zhaoyi Wan, Cong Yao, Kai Chen, Xiang Bai. Real-time Scene Text Detection with Differentiable Binarization. arXiv:1911.08947, 2019

This is forked from https://github.com/mindspore-lab/mindocr

Jupyter Notebook Python Text Markdown Shell

root@ubuntu.huawei

How to access data resources in code