Branch: master

9.7 KiB

Raw Permalink Blame History

MindOCR

Introduction |
Installation |
Quick Start |
Model List |
Notes

Introduction

MindOCR is an open-source toolbox for OCR development and application based on MindSpore. It helps users to train and apply the best text detection and recognition models, such as DBNet/DBNet++ and CRNN/SVTR, to fulfuill image-text understanding need.

Major Features

Modulation design: We decouple the ocr task into serveral configurable modules. Users can setup the training and evaluation pipeline easily for customized data and models with a few line of modification.
High-performance: MindOCR provides pretrained weights and the used training recipes that reach competitive performance on OCR tasks.
Low-cost-to-apply: We provide easy-to-use tools to run text detection and recogintion on real-world data. (coming soon)

Installation

Dependency

To install the dependency, please run

pip install -r requirements.txt

Additionaly, please install MindSpore(>=1.8.1) following the official instructions for the best fit of your machine. To enable training in distributed mode, please also install openmpi.

Install with PyPI

Coming soon

Install from Source

The latest version of MindOCR can be installed as follows:

pip install git+https://github.com/mindspore-lab/mindocr.git

Notes: MindOCR is only tested on MindSpore>=1.8.1, Linux on GPU/Ascend devices currently.

Quick Start

Text Detection Model Training

We will use DBNet model and ICDAR2015 dataset for illustration, although other models and datasets are also supported.

1. Data Preparation

Please download the ICDAR2015 dataset from this website, then format the dataset annotation refer to dataset_convert.

After preparation, the data structure should be like

.
├── test
│   ├── images
│   │   ├── img_1.jpg
│   │   ├── img_2.jpg
│   │   └── ...
│   └── det_gt.txt
└── train
    ├── images
    │   ├── img_1.jpg
    │   ├── img_2.jpg
    │   └── ....jpg
    └── det_gt.txt

2. Configure Yaml

Please choose a yaml config file containing the target pre-defined model and data pipeline that you want to re-use from configs/det. Here we choose configs/det/db_r50_icdar15.yaml.

And change the data config args according to

train:
  dataset:
    data_dir: PATH/TO/TRAIN_IMAGES_DIR
    label_file: PATH/TO/TRAIN_LABELS.txt
eval:
  dataset:
    data_dir: PATH/TO/TEST_IMAGES_DIR
    label_file: PATH/TO/TEST_LABELS.txt

Optionally, change num_workers according to the cores of CPU, and change distribute to True if you are to train in distributed mode.

3. Training

To train the model, please run

# train dbnet on ic15 dataset
python tools/train.py --config configs/det/db_r50_icdar15.yaml

To train in distributed mode, please run

# n is the number of GPUs/NPUs
mpirun --allow-run-as-root -n 2 python tools/train.py --config configs/det/db_r50_icdar15.yaml

Notes: please ensure the arg distribute in yaml file is set True

The training result (including checkpoints, per-epoch performance and curves) will be saved in the directory parsed by the arg ckpt_save_dir.

4. Evaluation

To evaluate, please parse the checkpoint path to the arg ckpt_load_path in yaml config file and run

python tools/eval.py --config configs/det/db_r50_icdar15.yaml

Text Recognition Model Training

We will use CRNN model and LMDB dataset for illustration, although other models and datasets are also supported.

1. Data Preparation

Please download the LMDB dataset from here (ref: deep-text-recognition-benchmark).

There're several .zip data files:

data_lmdb_release.zip contains the entire datasets including train, valid and evaluation.
validation.zip is the union dataset for Validation
evaluation.zip contains several benchmarking datasets.

Unzip the data and after preparation, the data structure should be like

.
├── train
│   ├── MJ
│   │   ├── data.mdb
│   │   ├── lock.mdb
│   ├── ST
│   │   ├── data.mdb
│   │   ├── lock.mdb
└── validation
|   ├── data.mdb
|   ├── lock.mdb
└── evaluation
    ├── IC03
    │   ├── data.mdb
    │   ├── lock.mdb
    ├── IC13
    │   ├── data.mdb
    │   ├── lock.mdb
    └── ...

2. Configure Yaml

Please choose a yaml config file containing the target pre-defined model and data pipeline that you want to re-use from configs/rec. Here we choose configs/rec/vgg7_bilistm_ctc.yaml.

Please change the data config args accordingly, such as

train:
  dataset:
    type: LMDBDataset
    data_dir: lmdb_data/rec/train/
eval:
  dataset:
    type: LMDBDataset
    data_dir: lmdb_data/rec/validation/

Optionally, change num_workers according to the cores of CPU, and change distribute to True if you are to train in distributed mode.

3. Training

To train the model, please run

# train crnn on MJ+ST dataset
python tools/train.py --config configs/rec/vgg7_bilstm_ctc.py

To train in distributed mode, please run

# n is the number of GPUs/NPUs
mpirun --allow-run-as-root -n 2 python tools/train.py --config configs/det/vgg7_bilstm_ctc.yaml

Notes: please ensure the arg distribute in yaml file is set True

The training result (including checkpoints, per-epoch performance and curves) will be saved in the directory parsed by the arg ckpt_save_dir.

4. Evaluation

To evaluate, please parse the checkpoint path to the arg ckpt_load_path in yaml config file and run

python tools/eval.py --config /path/to/config.yaml

Inference and Deployment

Inference with MX Engine

Please refer to mx_infer

Inference with Lite

Coming soon

Inference with native MindSpore

Coming soon

Supported Models and Performance

Text Detection

The supported detection models and their performance on the test set of ICDAR2015 are as follow.

Model	Backbone	Pretrained	Recall	Precision	F-score	Config
DBNet	ResNet-50	ImageNet	81.97%	86.05%	83.96%	YAML
DBNet++	ResNet-50	ImageNet	82.02%	87.38%	84.62%	YAML

Text Recognition

The supported recognition models and their overall performance on the public benchmarking datasets (IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) are as follow

Model	Backbone	Avg Acc	Config
CRNN	VGG7	80.98%	YAML
CRNN	Resnet34_vd	84.64%	YAML

Notes

Change Log

2023/03/20

Arg names changed: output_keys -> output_columns, num_keys_to_net -> num_columns_to_net
Data pipeline updated

2023/03/13

Add system test and CI workflow.
Add modelarts adapter to allow training on OpenI platform. To train on OpenI:

  i)   Create a new training task on the openi cloud platform.
  ii)  Link the dataset (e.g., ic15_mindocr) on the webpage.
  iii) Add run parameter `config` and write the yaml file path on the website UI interface, e.g., '/home/work/user-job-dir/V0001/configs/rec/test.yaml'
  iv)  Add run parameter `enable_modelarts` and set True on the website UI interface.
  v)   Fill in other blanks and launch.

2023/03/08

Add evaluation script with arg ckpt_load_path
Arg ckpt_save_dir is moved from system to train in yaml.
Add drop_overflow_update control

How to Contribute

We appreciate all kind of contributions including issues and PRs to make MindOCR better.

Please refer to CONTRIBUTING.md for the contributing guideline. Please follow the Model Template and Guideline for contributing a model that fits the overall interface :)

License

This project follows the Apache License 2.0 open-source license.

Citation

If you find this project useful in your research, please consider citing:

@misc{MindSpore OCR 2023,
    title={{MindSpore OCR }:MindSpore OCR Toolbox},
    author={MindSpore Team},
    howpublished = {\url{https://github.com/mindspore-lab/mindocr/}},
    year={2023}
}

9.7 KiB Raw Permalink Blame History

MindOCR

Introduction

Installation

Dependency

Install with PyPI

Install from Source

Quick Start

Text Detection Model Training

1. Data Preparation

2. Configure Yaml

3. Training

4. Evaluation

Text Recognition Model Training

1. Data Preparation

2. Configure Yaml

3. Training

4. Evaluation

Inference and Deployment

Inference with MX Engine

Inference with Lite

Inference with native MindSpore

Supported Models and Performance

Text Detection

Text Recognition

Notes

Change Log

How to Contribute

License

Citation

9.7 KiB

Raw Permalink Blame History