Branch: dev-ch-model

6.9 KiB

Raw Permalink Blame History

MindOCR

English | 中文

Introduction |
Installation |
Quick Start |
Model List |
Notes

Introduction

MindOCR is an open-source toolbox for OCR development and application based on MindSpore. It helps users to train and apply the best text detection and recognition models, such as DBNet/DBNet++ and CRNN/SVTR, to fulfill image-text understanding needs.

Major Features

Modulation design: We decouple the OCR task into several configurable modules. Users can set up the training and evaluation pipeline easily for customized data and models with a few lines of modification.
High-performance: MindOCR provides pretrained weights and the used training recipes that reach competitive performance on OCR tasks.
Low-cost-to-apply: We provide easy-to-use inference tools to perform text detection and recognition tasks.

Installation

Dependency

To install the dependency, please run

pip install -r requirements.txt

Additionally, please install MindSpore(>=1.9) following the official installation instructions for the best fit of your machine.

For distributed training, please install openmpi 4.0.3.

Environment	Version
MindSpore	>=1.9
Python	>=3.7

Notes:
If you use MX Engine for Inference, the version of Python should be 3.9.
If scikit_image cannot be imported, you can use the following command line to set environment variable $LD_PRELOAD referring to here. Change path/to to your directory.
export LD_PRELOAD=path/to/scikit_image.libs/libgomp-d22c30c5.so.1.0.0:$LD_PRELOAD

Install with PyPI

Coming soon

Install from Source

The latest version of MindOCR can be installed as follows:

pip install git+https://github.com/mindspore-lab/mindocr.git

Notes: MindOCR is only tested on MindSpore>=1.9, Linux on GPU/Ascend devices currently.

Quick Start

1. Model Training and Evaluation

1.1 Text Detection

We will take DBNet model and ICDAR2015 dataset as an example to illustrate how to configure the training process with a few lines of modification on the yaml file.

Please refer to DBNet readme for detailed instructions.

1.2 Text Recognition

We will take CRNN model and LMDB dataset as an illustration on how to configure and launch the training process easily.

Detailed instructions can be viewed in CRNN readme.

Note:
The training pipeline is fully extendable. To train other text detection/recognition models on a new dataset, please configure the model architecture (backbone, neck, head) and data pipeline in the yaml file and launch the training script with python tools/train.py -c /path/to/yaml_config.

2. Inference and Deployment

2.1 Inference with MX Engine

MX, which is short for MindX, allows efficient model inference and deployment on Ascend devices.

MindOCR supports OCR model inference with MX Engine. Please refer to mx_infer for detailed illustrations.

2.2 Inference with MS Lite

Coming soon

2.3 Inference with native MindSpore

Coming soon

Model List

Text Detection

DBNet (AAAI'2020)
DBNet++ (TPAMI'2022)
FCENet (CVPR'2021) [dev]

Text Recognition

CRNN (TPAMI'2016)
ABINet (CVPR'2021) [dev]
SVTR (IJCAI'2022) [infer only]

For the detailed performance of the trained models, please refer to configs.

For detailed inference performance using MX engine, please refer to mx inference performance

Notes

Change Log

2023/04/12

Support parameter grouping, which can be configure by the grouping_strategy or no_weight_decay_params arg.

2023/03/23

Add dynamic loss scaler support, compatible with drop overflow update. To enable dynamic loss scaler, please set type of loss_scale as dynamic. A YAML example can be viewed in configs/rec/crnn/crnn_icdar15.yaml

2023/03/20

Arg names changed: output_keys -> output_columns, num_keys_to_net -> num_columns_to_net
Data pipeline updated

2023/03/13

Add system test and CI workflow.
Add modelarts adapter to allow training on OpenI platform. To train on OpenI:

  i)   Create a new training task on the openi cloud platform.
  ii)  Link the dataset (e.g., ic15_mindocr) on the webpage.
  iii) Add run parameter `config` and write the yaml file path on the website UI interface, e.g., '/home/work/user-job-dir/V0001/configs/rec/test.yaml'
  iv)  Add run parameter `enable_modelarts` and set True on the website UI interface.
  v)   Fill in other blanks and launch.

2023/03/08

Add evaluation script with arg ckpt_load_path
Arg ckpt_save_dir is moved from system to train in yaml.
Add drop_overflow_update control

How to Contribute

We appreciate all kinds of contributions including issues and PRs to make MindOCR better.

Please refer to CONTRIBUTING.md for the contributing guideline. Please follow the Model Template and Guideline for contributing a model that fits the overall interface :)

License

This project follows the Apache License 2.0 open-source license.

Citation

If you find this project useful in your research, please consider citing:

@misc{MindSpore OCR 2023,
    title={{MindSpore OCR }:MindSpore OCR Toolbox},
    author={MindSpore Team},
    howpublished = {\url{https://github.com/mindspore-lab/mindocr/}},
    year={2023}
}

6.9 KiB Raw Permalink Blame History

MindOCR

Introduction

Installation

Dependency

Install with PyPI

Install from Source

Quick Start

1. Model Training and Evaluation

1.1 Text Detection

1.2 Text Recognition

2. Inference and Deployment

2.1 Inference with MX Engine

2.2 Inference with MS Lite

2.3 Inference with native MindSpore

Model List

Notes

Change Log

How to Contribute

License

Citation

6.9 KiB

Raw Permalink Blame History