mindocr

18 KiB

Raw Permalink Blame History

Inference - MindOCR Models

Inference - MindOCR Models

1. MindOCR Model Support List

1.1 Text detection

Model	Backbone	Language	Dataset	F-score(%)	FPS	data shape (NCHW)	Config	Download
DBNet	MobileNetV3	en	IC15	76.96	26.19	(1,3,736,1280)	yaml	ckpt \| mindir
	ResNet-18	en	IC15	81.73	24.04	(1,3,736,1280)	yaml	ckpt \| mindir
	ResNet-50	en	IC15	85.00	21.69	(1,3,736,1280)	yaml	ckpt \| mindir
	ResNet-50	ch + en	12 Datasets	83.41	21.69	(1,3,736,1280)	yaml	ckpt \| mindir
DBNet++	ResNet-50	en	IC15	86.79	8.46	(1,3,1152,2048)	yaml	ckpt \| mindir
	ResNet-50	ch + en	12 Datasets	84.30	8.46	(1,3,1152,2048)	yaml	ckpt \| mindir
EAST	ResNet-50	en	IC15	86.86	6.72	(1,3,720,1280)	yaml	ckpt \| mindir
	MobileNetV3	en	IC15	75.32	26.77	(1,3,720,1280)	yaml	ckpt \| mindir
PSENet	ResNet-152	en	IC15	82.50	2.52	(1,3,1472,2624)	yaml	ckpt \| mindir
	ResNet-50	en	IC15	81.37	10.16	(1,3,736,1312)	yaml	ckpt \| mindir
	MobileNetV3	en	IC15	70.56	10.38	(1,3,736,1312)	yaml	ckpt \| mindir
FCENet	ResNet50	en	IC15	78.94	14.59	(1,3,736,1280)	yaml	ckpt \| mindir

1.2 Text recognition

Model	Backbone	Dict File	Dataset	Acc(%)	FPS	data shape (NCHW)	Config	Download
CRNN	VGG7	Default	IC15	66.01	465.64	(1,3,32,100)	yaml	ckpt \| mindir
	ResNet34_vd	Default	IC15	69.67	397.29	(1,3,32,100)	yaml	ckpt \| mindir
	ResNet34_vd	ch_dict.txt	/	/	/	(1,3,32,320)	yaml	ckpt \| mindir
SVTR	Tiny	Default	IC15	79.92	338.04	(1,3,64,256)	yaml	ckpt \| mindir
Rare	ResNet34_vd	Default	IC15	69.47	273.23	(1,3,32,100)	yaml	ckpt \| mindir
	ResNet34_vd	ch_dict.txt	/	/	/	(1,3,32,320)	yaml	ckpt \| mindir
RobustScanner	ResNet-31	en_dict90.txt	IC15	73.71	22.30	(1,3,48,160)	yaml	ckpt \| mindir
VisionLAN	ResNet-45	Default	IC15	80.07	321.37	(1,3,64,256)	yaml(LA)	ckpt(LA) \| mindir(LA)

2. Overview of MindOCR Inference

graph LR;
    subgraph Step 1
        A[ckpt] -- export.py --> B[MindIR]
    end

    subgraph Step 2
        B -- converter_lite --> C[MindSpore Lite MindIR];
    end

    subgraph Step 3
        C -- input --> D[infer.py];
    end

    subgraph Step 4
        D -- outputs --> E[eval_rec.py/eval_det.py];
    end

    F[images] -- input --> D;

As shown in the figure above, the inference process is divided into the following steps:

Use tools/export.py to export the ckpt model to MindIR model;
Download and configure the model converter (i.e. converter_lite), and use the converter_lite tool to convert the MindIR to the MindSpore Lite MindIR;
After preparing the MindSpore Lite MindIR and the input image, use deploy/py_infer/infer.py to perform inference;
Depending on the type of model, use deploy/eval_utils/eval_det.py to evaluate the inference results of the text detection models, or use deploy/eval_utils/eval_rec.py for text recognition models.

Note: Step 1 runs on Ascend910, GPU or CPU. Step 2, 3, 4 run on Ascend310 or 310P.

3. MindOCR Inference Methods

3.1 Text Detection

Let's take DBNet ResNet-50 en in the model support list as an example to introduce the inference method:

Download the ckpt file in the model support list and use the following command to export to MindIR, or directly download the exported mindir file from the model support list:

# Use the local ckpt file to export the MindIR of the `DBNet ResNet-50 en` model
# For more parameter usage details, please execute `python tools/export.py -h`
python tools/export.py --model_name_or_config dbnet_resnet50 --data_shape 736 1280 --local_ckpt_path /path/to/dbnet.ckpt

In the above command, --model_name_or_config is the model name in MindOCR or we can pass the yaml directory to it (for example --model_name_or_config configs/rec/crnn/crnn_resnet34.yaml);

The --data_shape 736 1280 parameter indicates that the size of the model input image is [736, 1280], and each MindOCR model corresponds to a fixed export data shape. For details, see data shape in the model support list;

--local_ckpt_path /path/to/dbnet.ckpt parameter indicates that the model file to be exported is /path/to/dbnet.ckpt

Use the converter_lite tool on Ascend310 or 310P to convert the MindIR to MindSpore Lite MindIR:

Run the following command:

converter_lite \
     --saveType=MINDIR \
     --fmk=MINDIR \
     --optimize=ascend_oriented \
     --modelFile=dbnet_resnet50-c3a4aa24-fbf95c82.mindir \
     --outputFile=dbnet_resnet50

In the above command:

--fmk=MINDIR indicates that the original format of the input model is MindIR, and the --fmk parameter also supports ONNX, etc.;

--saveType=MINDIR indicates that the output model format is MindIR format;

--optimize=ascend_oriented indicates that optimize for Ascend devices;

--modelFile=dbnet_resnet50-c3a4aa24-fbf95c82.mindir indicates that the current model path to be converted is dbnet_resnet50-c3a4aa24-fbf95c82.mindir;

--outputFile=dbnet_resnet50 indicates that the path of the output model is dbnet_resnet50, which can be automatically generated without adding the .mindir suffix;

After the above command is executed, the dbnet_resnet50.mindir model file will be generated;

Learn more about converter_lite

Learn more about Model Conversion Tutorial

Perform inference using /deploy/py_infer/infer.py codes and dbnet_resnet50.mindir file:

python infer.py \
     --input_images_dir=/path/to/ic15/ch4_test_images \
     --det_model_path=/path/to/mindir/dbnet_resnet50.mindir \
     --det_model_name_or_config=en_ms_det_dbnet_resnet50 \
     --res_save_dir=/path/to/dbnet_resnet50_results

After the execution is completed, the prediction file det_results.txt will be generated in the directory pointed to by the parameter --res_save_dir

When doing inference, you can use the --vis_det_save_dir parameter to visualize the results:

Visualization of text detection results

Learn more about infer.py inference parameters

Evaluate the results with the following command:

python deploy/eval_utils/eval_det.py \
     --gt_path=/path/to/ic15/test_det_gt.txt \
     --pred_path=/path/to/dbnet_resnet50_results/det_results.txt

The result is: {'recall': 0.8348579682233991, 'precision': 0.8657014478282576, 'f-score': 0.85}

3.2 Text Recognition

Let's take CRNN ResNet34_vd en in the model support list as an example to introduce the inference method:

Download the MindIR file in the model support list;
Use the converter_lite tool on Ascend310 or 310P to convert the MindIR to MindSpore Lite MindIR: