Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
katekong02 04c2ac93af | 1 year ago | |
---|---|---|
configs | 1 year ago | |
deploy | 1 year ago | |
docs | 1 year ago | |
mindocr | 1 year ago | |
tests | 1 year ago | |
tools | 1 year ago | |
.gitignore | 1 year ago | |
LICENSE | 1 year ago | |
MANIFEST.in | 1 year ago | |
README.md | 1 year ago | |
convert_datasets.sh | 1 year ago | |
package.sh | 1 year ago | |
requirements.txt | 1 year ago | |
setup.py | 1 year ago | |
transform_tutorial.ipynb | 1 year ago |
Introduction |
Installation |
Quick Start |
Model List |
Notes
MindOCR is an open-source toolbox for OCR development and application based on MindSpore. It helps users to train and apply the best text detection and recognition models, such as DBNet/DBNet++ and CRNN/SVTR, to fulfuill image-text understanding need.
To install the dependency, please run
pip install -r requirements.txt
Additionaly, please install MindSpore(>=1.8.1) following the official instructions for the best fit of your machine. To enable training in distributed mode, please also install openmpi.
Coming soon
The latest version of MindOCR can be installed as follows:
pip install git+https://github.com/mindspore-lab/mindocr.git
Notes: MindOCR is only tested on MindSpore>=1.8.1, Linux on GPU/Ascend devices currently.
We will use DBNet model and ICDAR2015 dataset for illustration, although other models and datasets are also supported.
Please download the ICDAR2015 dataset from this website, then format the dataset annotation refer to dataset_convert.
After preparation, the data structure should be like
.
├── test
│ ├── images
│ │ ├── img_1.jpg
│ │ ├── img_2.jpg
│ │ └── ...
│ └── det_gt.txt
└── train
├── images
│ ├── img_1.jpg
│ ├── img_2.jpg
│ └── ....jpg
└── det_gt.txt
Please choose a yaml config file containing the target pre-defined model and data pipeline that you want to re-use from configs/det
. Here we choose configs/det/db_r50_icdar15.yaml
.
And change the data config args according to
train:
dataset:
data_dir: PATH/TO/TRAIN_IMAGES_DIR
label_file: PATH/TO/TRAIN_LABELS.txt
eval:
dataset:
data_dir: PATH/TO/TEST_IMAGES_DIR
label_file: PATH/TO/TEST_LABELS.txt
Optionally, change num_workers
according to the cores of CPU, and change distribute
to True if you are to train in distributed mode.
To train the model, please run
# train dbnet on ic15 dataset
python tools/train.py --config configs/det/db_r50_icdar15.yaml
To train in distributed mode, please run
# n is the number of GPUs/NPUs
mpirun --allow-run-as-root -n 2 python tools/train.py --config configs/det/db_r50_icdar15.yaml
Notes: please ensure the arg
distribute
in yaml file is set True
The training result (including checkpoints, per-epoch performance and curves) will be saved in the directory parsed by the arg ckpt_save_dir
.
To evaluate, please parse the checkpoint path to the arg ckpt_load_path
in yaml config file and run
python tools/eval.py --config configs/det/db_r50_icdar15.yaml
We will use CRNN model and LMDB dataset for illustration, although other models and datasets are also supported.
Please download the LMDB dataset from here (ref: deep-text-recognition-benchmark).
There're several .zip data files:
data_lmdb_release.zip
contains the entire datasets including train, valid and evaluation.validation.zip
is the union dataset for Validationevaluation.zip
contains several benchmarking datasets.Unzip the data and after preparation, the data structure should be like
.
├── train
│ ├── MJ
│ │ ├── data.mdb
│ │ ├── lock.mdb
│ ├── ST
│ │ ├── data.mdb
│ │ ├── lock.mdb
└── validation
| ├── data.mdb
| ├── lock.mdb
└── evaluation
├── IC03
│ ├── data.mdb
│ ├── lock.mdb
├── IC13
│ ├── data.mdb
│ ├── lock.mdb
└── ...
Please choose a yaml config file containing the target pre-defined model and data pipeline that you want to re-use from configs/rec
. Here we choose configs/rec/vgg7_bilistm_ctc.yaml
.
Please change the data config args accordingly, such as
train:
dataset:
type: LMDBDataset
data_dir: lmdb_data/rec/train/
eval:
dataset:
type: LMDBDataset
data_dir: lmdb_data/rec/validation/
Optionally, change num_workers
according to the cores of CPU, and change distribute
to True if you are to train in distributed mode.
To train the model, please run
# train crnn on MJ+ST dataset
python tools/train.py --config configs/rec/vgg7_bilstm_ctc.py
To train in distributed mode, please run
# n is the number of GPUs/NPUs
mpirun --allow-run-as-root -n 2 python tools/train.py --config configs/det/vgg7_bilstm_ctc.yaml
Notes: please ensure the arg
distribute
in yaml file is set True
The training result (including checkpoints, per-epoch performance and curves) will be saved in the directory parsed by the arg ckpt_save_dir
.
To evaluate, please parse the checkpoint path to the arg ckpt_load_path
in yaml config file and run
python tools/eval.py --config /path/to/config.yaml
Please refer to mx_infer
Coming soon
Coming soon
The supported detection models and their performance on the test set of ICDAR2015 are as follow.
Model | Backbone | Pretrained | Recall | Precision | F-score | Config |
---|---|---|---|---|---|---|
DBNet | ResNet-50 | ImageNet | 81.97% | 86.05% | 83.96% | YAML |
DBNet++ | ResNet-50 | ImageNet | 82.02% | 87.38% | 84.62% | YAML |
The supported recognition models and their overall performance on the public benchmarking datasets (IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) are as follow
Model | Backbone | Avg Acc | Config |
---|---|---|---|
CRNN | VGG7 | 80.98% | YAML |
CRNN | Resnet34_vd | 84.64% | YAML |
output_keys
-> output_columns
, num_keys_to_net
-> num_columns_to_net
i) Create a new training task on the openi cloud platform.
ii) Link the dataset (e.g., ic15_mindocr) on the webpage.
iii) Add run parameter `config` and write the yaml file path on the website UI interface, e.g., '/home/work/user-job-dir/V0001/configs/rec/test.yaml'
iv) Add run parameter `enable_modelarts` and set True on the website UI interface.
v) Fill in other blanks and launch.
ckpt_load_path
ckpt_save_dir
is moved from system
to train
in yaml.We appreciate all kind of contributions including issues and PRs to make MindOCR better.
Please refer to CONTRIBUTING.md for the contributing guideline. Please follow the Model Template and Guideline for contributing a model that fits the overall interface :)
This project follows the Apache License 2.0 open-source license.
If you find this project useful in your research, please consider citing:
@misc{MindSpore OCR 2023,
title={{MindSpore OCR }:MindSpore OCR Toolbox},
author={MindSpore Team},
howpublished = {\url{https://github.com/mindspore-lab/mindocr/}},
year={2023}
}
This is forked from https://github.com/mindspore-lab/mindocr
Jupyter Notebook Python Text Markdown Shell
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》