MindOCR
Introduction |
Installation |
Quick Start |
Model List |
Notes
Introduction
MindOCR is an open-source toolbox for OCR development and application based on MindSpore. It helps users to train and apply the best text detection and recognition models, such as DBNet/DBNet++ and CRNN/SVTR, to fulfuill image-text understanding need.
Major Features
- Modulation design: We decouple the ocr task into serveral configurable modules. Users can setup the training and evaluation pipeline easily for customized data and models with a few line of modification.
- High-performance: MindOCR provides pretrained weights and the used training recipes that reach competitive performance on OCR tasks.
- Low-cost-to-apply: We provide easy-to-use tools to run text detection and recogintion on real-world data. (coming soon)
Installation
Dependency
To install the dependency, please run
pip install -r requirements.txt
Additionaly, please install MindSpore(>=1.8.1) following the official instructions for the best fit of your machine. To enable training in distributed mode, please also install openmpi.
Install with PyPI
Coming soon
Install from Source
The latest version of MindOCR can be installed as follows:
pip install git+https://github.com/mindspore-lab/mindocr.git
Notes: MindOCR is only tested on MindSpore>=1.8.1, Linux on GPU/Ascend devices currently.
Quick Start
Text Detection Model Training
We will use DBNet model and ICDAR2015 dataset for illustration, although other models and datasets are also supported.
1. Data Preparation
Please download the ICDAR2015 dataset from this website, then format the dataset annotation refer to dataset_convert.
After preparation, the data structure should be like
.
├── test
│ ├── images
│ │ ├── img_1.jpg
│ │ ├── img_2.jpg
│ │ └── ...
│ └── det_gt.txt
└── train
├── images
│ ├── img_1.jpg
│ ├── img_2.jpg
│ └── ....jpg
└── det_gt.txt
2. Configure Yaml
Please choose a yaml config file containing the target pre-defined model and data pipeline that you want to re-use from configs/det
. Here we choose configs/det/db_r50_icdar15.yaml
.
And change the data config args according to
train:
dataset:
data_dir: PATH/TO/TRAIN_IMAGES_DIR
label_file: PATH/TO/TRAIN_LABELS.txt
eval:
dataset:
data_dir: PATH/TO/TEST_IMAGES_DIR
label_file: PATH/TO/TEST_LABELS.txt
Optionally, change num_workers
according to the cores of CPU, and change distribute
to True if you are to train in distributed mode.
3. Training
To train the model, please run
# train dbnet on ic15 dataset
python tools/train.py --config configs/det/db_r50_icdar15.yaml
To train in distributed mode, please run
# n is the number of GPUs/NPUs
mpirun --allow-run-as-root -n 2 python tools/train.py --config configs/det/db_r50_icdar15.yaml
Notes: please ensure the arg distribute
in yaml file is set True
The training result (including checkpoints, per-epoch performance and curves) will be saved in the directory parsed by the arg ckpt_save_dir
.
4. Evaluation
To evaluate, please parse the checkpoint path to the arg ckpt_load_path
in yaml config file and run
python tools/eval.py --config configs/det/db_r50_icdar15.yaml
Text Recognition Model Training
We will use CRNN model and LMDB dataset for illustration, although other models and datasets are also supported.
1. Data Preparation
Please download the LMDB dataset from here (ref: deep-text-recognition-benchmark).
There're several .zip data files:
data_lmdb_release.zip
contains the entire datasets including train, valid and evaluation.
validation.zip
is the union dataset for Validation
evaluation.zip
contains several benchmarking datasets.
Unzip the data and after preparation, the data structure should be like
.
├── train
│ ├── MJ
│ │ ├── data.mdb
│ │ ├── lock.mdb
│ ├── ST
│ │ ├── data.mdb
│ │ ├── lock.mdb
└── validation
| ├── data.mdb
| ├── lock.mdb
└── evaluation
├── IC03
│ ├── data.mdb
│ ├── lock.mdb
├── IC13
│ ├── data.mdb
│ ├── lock.mdb
└── ...
2. Configure Yaml
Please choose a yaml config file containing the target pre-defined model and data pipeline that you want to re-use from configs/rec
. Here we choose configs/rec/vgg7_bilistm_ctc.yaml
.
Please change the data config args accordingly, such as
train:
dataset:
type: LMDBDataset
data_dir: lmdb_data/rec/train/
eval:
dataset:
type: LMDBDataset
data_dir: lmdb_data/rec/validation/
Optionally, change num_workers
according to the cores of CPU, and change distribute
to True if you are to train in distributed mode.
3. Training
To train the model, please run
# train crnn on MJ+ST dataset
python tools/train.py --config configs/rec/vgg7_bilstm_ctc.py
To train in distributed mode, please run
# n is the number of GPUs/NPUs
mpirun --allow-run-as-root -n 2 python tools/train.py --config configs/det/vgg7_bilstm_ctc.yaml
Notes: please ensure the arg distribute
in yaml file is set True
The training result (including checkpoints, per-epoch performance and curves) will be saved in the directory parsed by the arg ckpt_save_dir
.
4. Evaluation
To evaluate, please parse the checkpoint path to the arg ckpt_load_path
in yaml config file and run
python tools/eval.py --config /path/to/config.yaml
Inference and Deployment
Inference with MX Engine
Please refer to mx_infer
Inference with Lite
Coming soon
Inference with native MindSpore
Coming soon
Supported Models and Performance
Text Detection
The supported detection models and their performance on the test set of ICDAR2015 are as follow.
Model |
Backbone |
Pretrained |
Recall |
Precision |
F-score |
Config |
DBNet |
ResNet-50 |
ImageNet |
81.97% |
86.05% |
83.96% |
YAML |
DBNet++ |
ResNet-50 |
ImageNet |
82.02% |
87.38% |
84.62% |
YAML |
Text Recognition
The supported recognition models and their overall performance on the public benchmarking datasets (IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) are as follow
Model |
Backbone |
Avg Acc |
Config |
CRNN |
VGG7 |
80.98% |
YAML |
CRNN |
Resnet34_vd |
84.64% |
YAML |
Notes
Change Log
- Arg names changed:
output_keys
-> output_columns
, num_keys_to_net
-> num_columns_to_net
- Data pipeline updated
- Add system test and CI workflow.
- Add modelarts adapter to allow training on OpenI platform. To train on OpenI:
i) Create a new training task on the openi cloud platform.
ii) Link the dataset (e.g., ic15_mindocr) on the webpage.
iii) Add run parameter `config` and write the yaml file path on the website UI interface, e.g., '/home/work/user-job-dir/V0001/configs/rec/test.yaml'
iv) Add run parameter `enable_modelarts` and set True on the website UI interface.
v) Fill in other blanks and launch.
- Add evaluation script with arg
ckpt_load_path
- Arg
ckpt_save_dir
is moved from system
to train
in yaml.
- Add drop_overflow_update control
How to Contribute
We appreciate all kind of contributions including issues and PRs to make MindOCR better.
Please refer to CONTRIBUTING.md for the contributing guideline. Please follow the Model Template and Guideline for contributing a model that fits the overall interface :)
License
This project follows the Apache License 2.0 open-source license.
Citation
If you find this project useful in your research, please consider citing:
@misc{MindSpore OCR 2023,
title={{MindSpore OCR }:MindSpore OCR Toolbox},
author={MindSpore Team},
howpublished = {\url{https://github.com/mindspore-lab/mindocr/}},
year={2023}
}