Inference - Tutorial
1. Introduction
MindOCR inference supports Ascend310/Ascend310P devices, supports MindSpore Lite and
ACL
inference backend, integrates text detection, angle classification, and text recognition, implements end-to-end OCR
inference process, and optimizes inference performance using pipeline parallelism.
The overall process of MindOCR Lite inference is as follows:
graph LR;
A[MindOCR models] -- export --> B[MindIR] -- converter_lite --> C[MindSpore Lite MindIR];
D[ThirdParty models] -- xx2onnx --> E[ONNX] -- converter_lite --> C;
C --input --> F[MindOCR Infer] -- outputs --> G[Evaluation];
H[images] --input --> F[MindOCR Infer];
2. Environment
Please refer to the environment installation to configure the inference runtime environment for
MindOCR, and pay attention to selecting the ACL/Lite environment based on the model.
3. Model conversion
MindOCR inference not only supports exported models from trained ckpt file, but also supports the third-party models, as
listed in the MindOCR Models Support List and
Third-party Models Support List (PaddleOCR, MMOCR, etc.).
Please refer to the Conversion Tutorial, to convert it into a model format supported by
MindOCR inference.
4. Inference (Python)
Enter the inference directory:cd deploy/py_infer
.
4.1 Command example
- detection + classification + recognition
python infer.py \
--input_images_dir=/path/to/images \
--det_model_path=/path/to/mindir/dbnet_resnet50.mindir \
--det_model_name_or_config=../../configs/det/dbnet/db_r50_icdar15.yaml \
--cls_model_path=/path/to/mindir/cls_mv3.mindir \
--cls_model_name_or_config=ch_pp_mobile_cls_v2.0 \
--rec_model_path=/path/to/mindir/crnn_resnet34.mindir \
--rec_model_name_or_config=../../configs/rec/crnn/crnn_resnet34.yaml \
--res_save_dir=det_cls_rec \
--vis_pipeline_save_dir=det_cls_rec
The visualization images are stored in det_cls_rec, as shown in the picture.
Visualization of text detection and recognition result
The results are saved in det_cls_rec/pipeline_results.txt in the following format:
img_182.jpg [{"transcription": "cocoa", "points": [[14.0, 284.0], [222.0, 274.0], [225.0, 325.0], [17.0, 335.0]]}, {...}]
If you don't enter the parameters related to classification, it will skip and only perform detection+recognition.
python infer.py \
--input_images_dir=/path/to/images \
--det_model_path=/path/to/mindir/dbnet_resnet50.mindir \
--det_model_name_or_config=../../configs/det/dbnet/db_r50_icdar15.yaml \
--rec_model_path=/path/to/mindir/crnn_resnet34.mindir \
--rec_model_name_or_config=../../configs/rec/crnn/crnn_resnet34.yaml \
--res_save_dir=det_rec \
--vis_pipeline_save_dir=det_rec
The visualization images are stored in det_rec, as shown in the picture.
Visualization of text detection and recognition result
The recognition results are saved in det_rec/pipeline_results.txt in the following format:
img_498.jpg [{"transcription": "keep", "points": [[819.0, 71.0], [888.0, 67.0], [891.0, 104.0], [822.0, 108.0]]}, {...}]
Run text detection alone.
python infer.py \
--input_images_dir=/path/to/images \
--det_model_path=/path/to/mindir/dbnet_resnet50.mindir \
--det_model_name_or_config=../../configs/det/dbnet/db_r50_icdar15.yaml \
--res_save_dir=det \
--vis_det_save_dir=det
The visualization results are stored in the det folder, as shown in the picture.
Visualization of text detection result
The detection results are saved in the det/det_results.txt file in the following format:
img_108.jpg [[[226.0, 442.0], [402.0, 416.0], [404.0, 433.0], [228.0, 459.0]], [...]]
Run text angle classification alone.
# cls_mv3.mindir is converted from ppocr
python infer.py \
--input_images_dir=/path/to/images \
--cls_model_path=/path/to/mindir/cls_mv3.mindir \
--cls_model_name_or_config=ch_pp_mobile_cls_v2.0 \
--res_save_dir=cls
The results will be saved in cls/cls_results.txt, with the following format:
word_867.png ["180", 0.5176]
word_1679.png ["180", 0.6226]
word_1189.png ["0", 0.9360]
Run text recognition alone.
python infer.py \
--input_images_dir=/path/to/images \
--rec_model_path=/path/to/mindir/crnn_resnet34.mindir \
--rec_model_name_or_config=../../configs/rec/crnn/crnn_resnet34.yaml \
--res_save_dir=rec
The results will be saved in rec/rec_results.txt, with the following format:
word_421.png "under"
word_1657.png "candy"
word_1814.png "cathay"
4.2 Detail of inference parameter
name |
type |
default |
description |
input_images_dir |
str |
None |
Image or folder path for inference |
device |
str |
Ascend |
Device type, support Ascend |
device_id |
int |
0 |
Device id |
backend |
str |
lite |
Inference backend, support acl, lite |
parallel_num |
int |
1 |
Number of parallel in each stage of pipeline parallelism |
precision_mode |
str |
None |
Precision mode, only supports setting by Model Conversion currently, and it takes no effect here |
name |
type |
default |
description |
res_save_dir |
str |
inference_results |
Saving dir for inference results |
vis_det_save_dir |
str |
None |
Saving dir for images of with detection boxes |
vis_pipeline_save_dir |
str |
None |
Saving dir for images of with detection boxes and text |
vis_font_path |
str |
None |
Font path for drawing text |
crop_save_dir |
str |
None |
Saving path for cropped images after detection |
show_log |
bool |
False |
Whether show log when inferring |
save_log_dir |
str |
None |
Log saving dir |
name |
type |
default |
description |
det_model_path |
str |
None |
Model path for text detection |
det_model_name_or_config |
str |
None |
Model name or YAML config file path for text detection |
- Text angle classification
name |
type |
default |
description |
cls_model_path |
str |
None |
Model path for text angle classification |
cls_model_name_or_config |
str |
None |
Model name or YAML config file path for text angle classification |
name |
type |
default |
description |
rec_model_path |
str |
None |
Model path for text recognition |
rec_model_name_or_config |
str |
None |
Model name or YAML config file path for text recognition |
character_dict_path |
str |
None |
Dict file for text recognition,default only supports numbers and lowercase |
Notes:
*_model_name_or_config
can be the model name or YAML config file path, please refer to
MindOCR Models Support List and
Third-party Models Support List (PaddleOCR, MMOCR, etc.).
5. Inference (C++)
Currently, only the Chinese DBNet, CRNN, and SVTR models in the PP-OCR series are supported.
Enter the inference directory:cd deploy/cpp_infer
,then execute the compilation script bash build.sh
. Once the build
process is complete, an executable file named 'infer' will be generated in the 'dist' directory located in the current
path.
5.1 Command example
- detection + classification + recognition
./dist/infer \
--input_images_dir /path/to/images \
--backend lite \
--det_model_path /path/to/mindir/dbnet_resnet50.mindir \
--cls_model_path /path/to/mindir/crnn \
--rec_model_path /path/to/mindir/crnn_resnet34.mindir \
--character_dict_path /path/to/ppocr_keys_v1.txt \
--res_save_dir det_cls_rec
The results will be saved in det_cls_rec/pipeline_results.txt, with the following format:
img_478.jpg [{"transcription": "spa", "points": [[1114, 35], [1200, 0], [1234, 52], [1148, 97]]}, {...}]
If you don't enter the parameters related to classification, it will skip and only perform detection+recognition.
./dist/infer \
--input_images_dir /path/to/images \
--backend lite \
--det_model_path /path/to/mindir/dbnet_resnet50.mindir \
--rec_model_path /path/to/mindir/crnn_resnet34.mindir \
--character_dict_path /path/to/ppocr_keys_v1.txt \
--res_save_dir det_rec
The results will be saved in det_rec/pipeline_results.txt, with the following format:
img_478.jpg [{"transcription": "spa", "points": [[1114, 35], [1200, 0], [1234, 52], [1148, 97]]}, {...}]
Run text detection alone.
./dist/infer \
--input_images_dir /path/to/images \
--backend lite \
--det_model_path /path/to/mindir/dbnet_resnet50.mindir \
--res_save_dir det
The results will be saved in det/det_results.txt, with the following format:
img_478.jpg [[[1114, 35], [1200, 0], [1234, 52], [1148, 97]], [...]]]
Run text angle classification alone.
./dist/infer \
--input_images_dir /path/to/images \
--backend lite \
--cls_model_path /path/to/mindir/crnn \
--res_save_dir cls
The results will be saved in cls/cls_results.txt, with the following format:
word_867.png ["180", 0.5176]
word_1679.png ["180", 0.6226]
word_1189.png ["0", 0.9360]
5.2 Detail of inference parameter
name |
type |
default |
description |
input_images_dir |
str |
None |
Image or folder path for inference |
device |
str |
Ascend |
Device type, support Ascend |
device_id |
int |
0 |
Device id |
backend |
str |
acl |
Inference backend, support acl, lite |
parallel_num |
int |
1 |
Number of parallel in each stage of pipeline parallelism |
name |
type |
default |
description |
res_save_dir |
str |
inference_results |
Saving dir for inference results |
name |
type |
default |
description |
det_model_path |
str |
None |
Model path for text detection |
- Text angle classification
name |
type |
default |
description |
cls_model_path |
str |
None |
Model path for text angle classification |
name |
type |
default |
description |
rec_model_path |
str |
None |
Model path for text recognition |
rec_config_path |
str |
None |
Config file for text recognition |
character_dict_path |
str |
None |
Dict file for text recognition,default only supports numbers and lowercase |