MindOCR inference supports Ascend310/Ascend310P devices, supports MindSpore Lite and
ACL
inference backend, integrates text detection, angle classification, and text recognition, implements end-to-end OCR
inference process, and optimizes inference performance using pipeline parallelism.
The overall process of MindOCR Lite inference is as follows:
graph LR;
A[MindOCR models] -- export --> B[MindIR] -- converter_lite --> C[MindSpore Lite MindIR];
D[ThirdParty models] -- xx2onnx --> E[ONNX] -- converter_lite --> C;
C --input --> F[MindOCR Infer] -- outputs --> G[Evaluation];
H[images] --input --> F[MindOCR Infer];
Please refer to the environment installation to configure the inference runtime environment for
MindOCR, and pay attention to selecting the ACL/Lite environment based on the model.
MindOCR inference not only supports exported models from trained ckpt file, but also supports the third-party models, as
listed in the MindOCR Models Support List and
Third-party Models Support List (PaddleOCR, MMOCR, etc.).
Please refer to the Conversion Tutorial, to convert it into a model format supported by
MindOCR inference.
Enter the inference directory:cd deploy/py_infer
.
python infer.py \
--input_images_dir=/path/to/images \
--det_model_path=/path/to/mindir/dbnet_resnet50.mindir \
--det_model_name_or_config=../../configs/det/dbnet/db_r50_icdar15.yaml \
--cls_model_path=/path/to/mindir/cls_mv3.mindir \
--cls_model_name_or_config=ch_pp_mobile_cls_v2.0 \
--rec_model_path=/path/to/mindir/crnn_resnet34.mindir \
--rec_model_name_or_config=../../configs/rec/crnn/crnn_resnet34.yaml \
--res_save_dir=det_cls_rec \
--vis_pipeline_save_dir=det_cls_rec
The visualization images are stored in det_cls_rec, as shown in the picture.
Visualization of text detection and recognition result
The results are saved in det_cls_rec/pipeline_results.txt in the following format:
img_182.jpg [{"transcription": "cocoa", "points": [[14.0, 284.0], [222.0, 274.0], [225.0, 325.0], [17.0, 335.0]]}, {...}]
If you don't enter the parameters related to classification, it will skip and only perform detection+recognition.
python infer.py \
--input_images_dir=/path/to/images \
--det_model_path=/path/to/mindir/dbnet_resnet50.mindir \
--det_model_name_or_config=../../configs/det/dbnet/db_r50_icdar15.yaml \
--rec_model_path=/path/to/mindir/crnn_resnet34.mindir \
--rec_model_name_or_config=../../configs/rec/crnn/crnn_resnet34.yaml \
--res_save_dir=det_rec \
--vis_pipeline_save_dir=det_rec
The visualization images are stored in det_rec, as shown in the picture.
Visualization of text detection and recognition result
The recognition results are saved in det_rec/pipeline_results.txt in the following format:
img_498.jpg [{"transcription": "keep", "points": [[819.0, 71.0], [888.0, 67.0], [891.0, 104.0], [822.0, 108.0]]}, {...}]
Run text detection alone.
python infer.py \
--input_images_dir=/path/to/images \
--det_model_path=/path/to/mindir/dbnet_resnet50.mindir \
--det_model_name_or_config=../../configs/det/dbnet/db_r50_icdar15.yaml \
--res_save_dir=det \
--vis_det_save_dir=det
The visualization results are stored in the det folder, as shown in the picture.
Visualization of text detection result
The detection results are saved in the det/det_results.txt file in the following format:
img_108.jpg [[[226.0, 442.0], [402.0, 416.0], [404.0, 433.0], [228.0, 459.0]], [...]]
Run text angle classification alone.
# cls_mv3.mindir is converted from ppocr
python infer.py \
--input_images_dir=/path/to/images \
--cls_model_path=/path/to/mindir/cls_mv3.mindir \
--cls_model_name_or_config=ch_pp_mobile_cls_v2.0 \
--res_save_dir=cls
The results will be saved in cls/cls_results.txt, with the following format:
word_867.png ["180", 0.5176]
word_1679.png ["180", 0.6226]
word_1189.png ["0", 0.9360]
Run text recognition alone.
python infer.py \
--input_images_dir=/path/to/images \
--rec_model_path=/path/to/mindir/crnn_resnet34.mindir \
--rec_model_name_or_config=../../configs/rec/crnn/crnn_resnet34.yaml \
--res_save_dir=rec
The results will be saved in rec/rec_results.txt, with the following format:
word_421.png "under"
word_1657.png "candy"
word_1814.png "cathay"
name | type | default | description |
---|---|---|---|
input_images_dir | str | None | Image or folder path for inference |
device | str | Ascend | Device type, support Ascend |
device_id | int | 0 | Device id |
backend | str | lite | Inference backend, support acl, lite |
parallel_num | int | 1 | Number of parallel in each stage of pipeline parallelism |
precision_mode | str | None | Precision mode, only supports setting by Model Conversion currently, and it takes no effect here |
name | type | default | description |
---|---|---|---|
res_save_dir | str | inference_results | Saving dir for inference results |
vis_det_save_dir | str | None | Saving dir for images of with detection boxes |
vis_pipeline_save_dir | str | None | Saving dir for images of with detection boxes and text |
vis_font_path | str | None | Font path for drawing text |
crop_save_dir | str | None | Saving path for cropped images after detection |
show_log | bool | False | Whether show log when inferring |
save_log_dir | str | None | Log saving dir |
name | type | default | description |
---|---|---|---|
det_model_path | str | None | Model path for text detection |
det_model_name_or_config | str | None | Model name or YAML config file path for text detection |
name | type | default | description |
---|---|---|---|
cls_model_path | str | None | Model path for text angle classification |
cls_model_name_or_config | str | None | Model name or YAML config file path for text angle classification |
name | type | default | description |
---|---|---|---|
rec_model_path | str | None | Model path for text recognition |
rec_model_name_or_config | str | None | Model name or YAML config file path for text recognition |
character_dict_path | str | None | Dict file for text recognition,default only supports numbers and lowercase |
Notes:
*_model_name_or_config
can be the model name or YAML config file path, please refer to
MindOCR Models Support List and
Third-party Models Support List (PaddleOCR, MMOCR, etc.).
Currently, only the Chinese DBNet, CRNN, and SVTR models in the PP-OCR series are supported.
Enter the inference directory:cd deploy/cpp_infer
,then execute the compilation script bash build.sh
. Once the build
process is complete, an executable file named 'infer' will be generated in the 'dist' directory located in the current
path.
./dist/infer \
--input_images_dir /path/to/images \
--backend lite \
--det_model_path /path/to/mindir/dbnet_resnet50.mindir \
--cls_model_path /path/to/mindir/crnn \
--rec_model_path /path/to/mindir/crnn_resnet34.mindir \
--character_dict_path /path/to/ppocr_keys_v1.txt \
--res_save_dir det_cls_rec
The results will be saved in det_cls_rec/pipeline_results.txt, with the following format:
img_478.jpg [{"transcription": "spa", "points": [[1114, 35], [1200, 0], [1234, 52], [1148, 97]]}, {...}]
If you don't enter the parameters related to classification, it will skip and only perform detection+recognition.
./dist/infer \
--input_images_dir /path/to/images \
--backend lite \
--det_model_path /path/to/mindir/dbnet_resnet50.mindir \
--rec_model_path /path/to/mindir/crnn_resnet34.mindir \
--character_dict_path /path/to/ppocr_keys_v1.txt \
--res_save_dir det_rec
The results will be saved in det_rec/pipeline_results.txt, with the following format:
img_478.jpg [{"transcription": "spa", "points": [[1114, 35], [1200, 0], [1234, 52], [1148, 97]]}, {...}]
Run text detection alone.
./dist/infer \
--input_images_dir /path/to/images \
--backend lite \
--det_model_path /path/to/mindir/dbnet_resnet50.mindir \
--res_save_dir det
The results will be saved in det/det_results.txt, with the following format:
img_478.jpg [[[1114, 35], [1200, 0], [1234, 52], [1148, 97]], [...]]]
Run text angle classification alone.
./dist/infer \
--input_images_dir /path/to/images \
--backend lite \
--cls_model_path /path/to/mindir/crnn \
--res_save_dir cls
The results will be saved in cls/cls_results.txt, with the following format:
word_867.png ["180", 0.5176]
word_1679.png ["180", 0.6226]
word_1189.png ["0", 0.9360]
name | type | default | description |
---|---|---|---|
input_images_dir | str | None | Image or folder path for inference |
device | str | Ascend | Device type, support Ascend |
device_id | int | 0 | Device id |
backend | str | acl | Inference backend, support acl, lite |
parallel_num | int | 1 | Number of parallel in each stage of pipeline parallelism |
name | type | default | description |
---|---|---|---|
res_save_dir | str | inference_results | Saving dir for inference results |
name | type | default | description |
---|---|---|---|
det_model_path | str | None | Model path for text detection |
name | type | default | description |
---|---|---|---|
cls_model_path | str | None | Model path for text angle classification |
name | type | default | description |
---|---|---|---|
rec_model_path | str | None | Model path for text recognition |
rec_config_path | str | None | Config file for text recognition |
character_dict_path | str | None | Dict file for text recognition,default only supports numbers and lowercase |
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》