Samit b42c4e5bd2 | 11 months ago | |
---|---|---|
.github | 11 months ago | |
configs | 11 months ago | |
deploy | 11 months ago | |
docs | 11 months ago | |
mindocr | 11 months ago | |
requirements | 11 months ago | |
tests | 11 months ago | |
tools | 11 months ago | |
.flake8 | 11 months ago | |
.gitignore | 1 year ago | |
.pre-commit-config.yaml | 11 months ago | |
CONTRIBUTING.md | 11 months ago | |
LICENSE | 1 year ago | |
MANIFEST.in | 1 year ago | |
README.md | 11 months ago | |
README_CN.md | 11 months ago | |
package.sh | 11 months ago | |
pyproject.toml | 11 months ago | |
requirements.txt | 11 months ago | |
setup.py | 11 months ago |
English | 中文
📝Introduction |
🔨Installation |
🚀Quick Start |
📚Tutorials |
🎁Model List |
📰Dataset List |
🎉Notes
MindOCR is an open-source toolbox for OCR development and application based on MindSpore, which integrates series of mainstream text detection and recognition algorihtms and models and provides easy-to-use training and inference tools. It can accelerate the process of developing and deploying SoTA text detection and recognition models in real-world applications, such as DBNet/DBNet++ and CRNN/SVTR, and help fulfill the need of image-text understanding .
MindOCR is built on MindSpore AI framework, which supports CPU/GPU/NPU devices.
MindOCR is compatible with the following framework versions. For details and installation guideline, please refer to the installation links shown below.
pip install -r requirements.txt
Tips:
If scikit_image cannot be imported, please set environment variable $LD_PRELOAD
as follows, (related opencv issue)
export LD_PRELOAD=path/to/scikit_image.libs/libgomp-d22c30c5.so.1.0.0:$LD_PRELOAD
git clone https://github.com/mindspore-lab/mindocr.git
cd mindocr
pip install -e .
Using
-e
for "editable" mode can help resolve potential module import issues.
pip install mindocr
As this project is under active development, the version installed from PyPI is out-of-date currently. (will update soon).
After installing MindOCR, we can run text detection and recognition on an arbitrary image easily as follows.
python tools/infer/text/predict_system.py --image_dir {path_to_img or dir_to_imgs} \
--det_algorithm DB++ \
--rec_algorithm CRNN
After running, the results will be saved in ./inference_results
by default. Here is an example result.
Visualization of text detection and recognition result
We can see that all texts on the image are detected and recognized accurately. For more usage, please refer to the inference section in tutorials.
It is easy to train your OCR model with the tools/train.py
script, which supports both text detection and recognition model training.
python tools/train.py --config {path/to/model_config.yaml}
The --config
arg specifies the path to a yaml file that defines the model to be trained and the training strategy including data process pipeline, optimizer, lr scheduler, etc.
MindOCR provides SoTA OCR models with their training strategies in configs
folder.
You may adapt it to your task/dataset, for example, by running
# train text detection model DBNet++ on icdar15 dataset
python tools/train.py --config configs/det/dbnet/db++_r50_icdar15.yaml
# train text recognition model CRNN on icdar15 dataset
python tools/train.py --config configs/rec/crnn/crnn_icdar15.yaml
Similarly, it is simple to evaluate the trained model with the tools/eval.py
script.
python tools/eval.py \
--config {path/to/model_config.yaml} \
--opt eval.dataset_root={path/to/your_dataset} eval.ckpt_load_path={path/to/ckpt_file}
For more illustration and usage, please refer to the model training section in Tutorials.
For the detailed performance of the trained models, please refer to configs.
For detailed support for MindSpore Lite and ACL inference models, please refer to MindOCR Models Support List and Third-Party Models Support List.
MindOCR provides a dataset conversion tool to OCR datasets with different formats and support customized dataset by users. We have validated the following public OCR datasets in model training/evaluation.
We will include more datasets for training and evaluation. This list will be continuously updated.
resume
parameter under the model
field in the yaml config, e.g.,resume: True
, load and resume training from {ckpt_save_dir}/train_resume.ckpt or resume: /path/to/train_resume.ckpt
, load and resume training from the given path.eval.dataset.output_columns
list.pred_cast_fp32
for ctcloss in AMP training, fix error when invalid polygons exist.model-pretrained
with checkpoint url or local path in yaml.train-ema
(default: False) and train-ema_decay
in the yaml config.num_columns_to_net
-> net_input_column_index
: change the column number feeding into the network to the column index.num_columns_of_labels
-> label_column_index
: change the column number corresponds to the label to the column index.grouping_strategy
argument in yaml config to select a predefined grouping strategy, or use no_weight_decay_params
argument to pick layers to exclude from weight decay (e.g., bias, norm). Example can be referred in configs/rec/crnn/crnn_icdar15.yaml
gradient_accumulation_steps
in yaml config, the global batch size = batch_size * devices * gradient_accumulation_steps. Example can be referred in configs/rec/crnn/crnn_icdar15.yaml
grad_clip
as True in yaml config.type
of loss_scale
as dynamic
. A YAML example can be viewed in configs/rec/crnn/crnn_icdar15.yaml
output_keys
-> output_columns
, num_keys_to_net
-> num_columns_to_net
i) Create a new training task on the openi cloud platform.
ii) Link the dataset (e.g., ic15_mindocr) on the webpage.
iii) Add run parameter `config` and write the yaml file path on the website UI interface, e.g., '/home/work/user-job-dir/V0001/configs/rec/test.yaml'
iv) Add run parameter `enable_modelarts` and set True on the website UI interface.
v) Fill in other blanks and launch.
We appreciate all kinds of contributions including issues and PRs to make MindOCR better.
Please refer to CONTRIBUTING.md for the contributing guideline. Please follow the Model Template and Guideline for contributing a model that fits the overall interface :)
This project follows the Apache License 2.0 open-source license.
If you find this project useful in your research, please consider citing:
@misc{MindSpore OCR 2023,
title={{MindSpore OCR }:MindSpore OCR Toolbox},
author={MindSpore Team},
howpublished = {\url{https://github.com/mindspore-lab/mindocr/}},
year={2023}
}
A toolbox of OCR models, algorithms, and pipelines based on MindSpore
https://github.com/mindspore-lab/mindocr
HTML JavaScript
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》