mindocr

13 KiB

Raw Permalink Blame History

Inference - Tutorial

Inference - Tutorial

1. Introduction

MindOCR inference supports Ascend310/Ascend310P devices, supports MindSpore Lite and
ACL
inference backend, integrates text detection, angle classification, and text recognition, implements end-to-end OCR
inference process, and optimizes inference performance using pipeline parallelism.

The overall process of MindOCR Lite inference is as follows:

graph LR;
    A[MindOCR models] -- export --> B[MindIR] -- converter_lite --> C[MindSpore Lite MindIR];
    D[ThirdParty models] -- xx2onnx --> E[ONNX] -- converter_lite --> C;
    C --input --> F[MindOCR Infer] -- outputs --> G[Evaluation];
    H[images] --input --> F[MindOCR Infer];

2. Environment

Please refer to the environment installation to configure the inference runtime environment for
MindOCR, and pay attention to selecting the ACL/Lite environment based on the model.

3. Model conversion

MindOCR inference not only supports exported models from trained ckpt file, but also supports the third-party models, as
listed in the MindOCR Models Support List and
Third-party Models Support List (PaddleOCR, MMOCR, etc.).

Please refer to the Conversion Tutorial, to convert it into a model format supported by
MindOCR inference.

4. Inference (Python)

Enter the inference directory：cd deploy/py_infer.

4.1 Command example

detection + classification + recognition

python infer.py \
    --input_images_dir=/path/to/images \
    --det_model_path=/path/to/mindir/dbnet_resnet50.mindir \
    --det_model_name_or_config=../../configs/det/dbnet/db_r50_icdar15.yaml \
    --cls_model_path=/path/to/mindir/cls_mv3.mindir \
    --cls_model_name_or_config=ch_pp_mobile_cls_v2.0 \
    --rec_model_path=/path/to/mindir/crnn_resnet34.mindir \
    --rec_model_name_or_config=../../configs/rec/crnn/crnn_resnet34.yaml \
    --res_save_dir=det_cls_rec \
    --vis_pipeline_save_dir=det_cls_rec

The visualization images are stored in det_cls_rec, as shown in the picture.

Visualization of text detection and recognition result

The results are saved in det_cls_rec/pipeline_results.txt in the following format:

img_182.jpg	[{"transcription": "cocoa", "points": [[14.0, 284.0], [222.0, 274.0], [225.0, 325.0], [17.0, 335.0]]}, {...}]

detection + recognition

If you don't enter the parameters related to classification, it will skip and only perform detection+recognition.

python infer.py \
    --input_images_dir=/path/to/images \
    --det_model_path=/path/to/mindir/dbnet_resnet50.mindir \
    --det_model_name_or_config=../../configs/det/dbnet/db_r50_icdar15.yaml \
    --rec_model_path=/path/to/mindir/crnn_resnet34.mindir \
    --rec_model_name_or_config=../../configs/rec/crnn/crnn_resnet34.yaml \
    --res_save_dir=det_rec \
    --vis_pipeline_save_dir=det_rec

The visualization images are stored in det_rec, as shown in the picture.

Visualization of text detection and recognition result

The recognition results are saved in det_rec/pipeline_results.txt in the following format:

img_498.jpg	[{"transcription": "keep", "points": [[819.0, 71.0], [888.0, 67.0], [891.0, 104.0], [822.0, 108.0]]}, {...}]

detection

Run text detection alone.

python infer.py \
    --input_images_dir=/path/to/images \
    --det_model_path=/path/to/mindir/dbnet_resnet50.mindir \
    --det_model_name_or_config=../../configs/det/dbnet/db_r50_icdar15.yaml \
    --res_save_dir=det \
    --vis_det_save_dir=det

The visualization results are stored in the det folder, as shown in the picture.

Visualization of text detection result

The detection results are saved in the det/det_results.txt file in the following format:

img_108.jpg	[[[226.0, 442.0], [402.0, 416.0], [404.0, 433.0], [228.0, 459.0]], [...]]

classification

Run text angle classification alone.

# cls_mv3.mindir is converted from ppocr
python infer.py \
    --input_images_dir=/path/to/images \
    --cls_model_path=/path/to/mindir/cls_mv3.mindir \
    --cls_model_name_or_config=ch_pp_mobile_cls_v2.0 \
    --res_save_dir=cls

The results will be saved in cls/cls_results.txt, with the following format:

word_867.png   ["180", 0.5176]
word_1679.png  ["180", 0.6226]
word_1189.png  ["0", 0.9360]

recognition

Run text recognition alone.

python infer.py \
    --input_images_dir=/path/to/images \
    --rec_model_path=/path/to/mindir/crnn_resnet34.mindir \
    --rec_model_name_or_config=../../configs/rec/crnn/crnn_resnet34.yaml \
    --res_save_dir=rec

The results will be saved in rec/rec_results.txt, with the following format:

word_421.png   "under"
word_1657.png  "candy"
word_1814.png  "cathay"

4.2 Detail of inference parameter

Basic settings

name	type	default	description
input_images_dir	str	None	Image or folder path for inference
device	str	Ascend	Device type, support Ascend
device_id	int	0	Device id
backend	str	lite	Inference backend, support acl, lite
parallel_num	int	1	Number of parallel in each stage of pipeline parallelism
precision_mode	str	None	Precision mode, only supports setting by Model Conversion currently, and it takes no effect here

Saving Result

name	type	default	description
res_save_dir	str	inference_results	Saving dir for inference results
vis_det_save_dir	str	None	Saving dir for images of with detection boxes
vis_pipeline_save_dir	str	None	Saving dir for images of with detection boxes and text
vis_font_path	str	None	Font path for drawing text
crop_save_dir	str	None	Saving path for cropped images after detection
show_log	bool	False	Whether show log when inferring
save_log_dir	str	None	Log saving dir

Text detection

name	type	default	description
det_model_path	str	None	Model path for text detection
det_model_name_or_config	str	None	Model name or YAML config file path for text detection

Text angle classification

name	type	default	description
cls_model_path	str	None	Model path for text angle classification
cls_model_name_or_config	str	None	Model name or YAML config file path for text angle classification

Text recognition

name	type	default	description
rec_model_path	str	None	Model path for text recognition
rec_model_name_or_config	str	None	Model name or YAML config file path for text recognition
character_dict_path	str	None	Dict file for text recognition，default only supports numbers and lowercase

Notes：

*_model_name_or_config can be the model name or YAML config file path, please refer to
MindOCR Models Support List and
Third-party Models Support List (PaddleOCR, MMOCR, etc.).

5. Inference (C++)

Currently, only the Chinese DBNet, CRNN, and SVTR models in the PP-OCR series are supported.

Enter the inference directory：cd deploy/cpp_infer,then execute the compilation script bash build.sh. Once the build
process is complete, an executable file named 'infer' will be generated in the 'dist' directory located in the current
path.

5.1 Command example

detection + classification + recognition

./dist/infer \
    --input_images_dir /path/to/images \
    --backend lite \
    --det_model_path /path/to/mindir/dbnet_resnet50.mindir \
    --cls_model_path /path/to/mindir/crnn \
    --rec_model_path /path/to/mindir/crnn_resnet34.mindir \
    --character_dict_path /path/to/ppocr_keys_v1.txt \
    --res_save_dir det_cls_rec

The results will be saved in det_cls_rec/pipeline_results.txt, with the following format:

img_478.jpg	[{"transcription": "spa", "points": [[1114, 35], [1200, 0], [1234, 52], [1148, 97]]}, {...}]

detection + recognition