English | 中文
CRNN
An End-to-End Trainable Neural Network for Image-based Sequence
Recognition and Its Application to Scene Text Recognition
1. Introduction
Convolutional Recurrent Neural Network (CRNN) integrates CNN feature extraction and RNN sequence modeling as well as transcription into a unified framework.
As shown in the architecture graph (Figure 1), CRNN firstly extracts a feature sequence from the input image via Convolutional Layers. After that, the image is represented by a squence extracted features, where each vector is associated with a receptive field on the input image. For futher process the feature, CRNN adopts Recurrent Layers to predict a label distribution for each frame. To map the distribution to text field, CRNN adds a Transcription Layer to translate the per-frame predictions into the final label sequence. [1]
Figure 1. Architecture of CRNN [1]
2. Results
According to our experiments, the evaluation results on public benchmark datasets (IC03, IC13, IC15, IIIT, SVT, SVTP, CUTE) is as follow:
Model |
Context |
Backbone |
Avg Accuracy |
Train T. (s/epoch) |
Recipe |
Download |
CRNN (ours) |
D910x8-MS1.8-G |
VGG7 |
82.03% |
2445 |
yaml |
weights |
CRNN (ours) |
D910x8-MS1.8-G |
ResNet34_vd |
84.45% |
2118 |
yaml |
weights |
CRNN (PaddleOCR) |
- |
ResNet34_vd |
83.99% |
- |
- |
- |
Notes:
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G-graph mode or F-pynative mode with ms function. For example, D910x8-MS1.8-G is for training on 8 pieces of Ascend 910 NPU using graph mode based on Minspore version 1.8.
- To reproduce the result on other contexts, please ensure the global batch size is the same.
- Both VGG and ResNet models are trained from scratch without any pre-training.
- The above models are trained with MJSynth (MJ) and SynthText (ST) datasets. For more data details, please refer to Dataset Preparation section.
- Evaluations are tested individually on each benchmark dataset, and Avg Accuracy is the average of accuracies across all sub-datasets.
- For the PaddleOCR version of CRNN, the performance is reported on the trained model provided on their github.
3. Quick Start
3.1 Preparation
3.1.1 Installation
Please refer to the installation instruction in MindOCR.
3.1.2 Dataset Preparation
Please download lmdb dataset for traininig and evaluation from here (ref: deep-text-recognition-benchmark). There're several zip files:
data_lmdb_release.zip
contains the entire datasets including training.zip, validation.zip and evaluation.zip.
validation.zip
is the union dataset for Validation.
evaluation.zip
contains several benchmarking datasets.
Unzip the data and after preparation, the data structure should be like
.
├── training
│ ├── MJ
│ │ ├── data.mdb
│ │ ├── lock.mdb
│ ├── ST
│ │ ├── data.mdb
│ │ ├── lock.mdb
└── validation
| ├── data.mdb
| ├── lock.mdb
└── evaluation
├── IC03
│ ├── data.mdb
│ ├── lock.mdb
├── IC13
│ ├── data.mdb
│ ├── lock.mdb
└── ...
3.1.3 Check YAML Config Files
Please check the following important args: system.distribute
, system.val_while_train
, common.batch_size
, train.ckpt_save_dir
, train.dataset.dataset_root
, train.dataset.data_dir
, train.dataset.label_file
,
eval.ckpt_load_path
, eval.dataset.dataset_root
, eval.dataset.data_dir
, eval.dataset.label_file
, eval.loader.batch_size
. Explanations of these important args:
system:
distribute: True # `True` for distributed training, `False` for standalone training
amp_level: 'O3'
seed: 42
val_while_train: True # Validate while training
drop_overflow_update: False
common:
...
batch_size: &batch_size 64 # Batch size for training
...
train:
ckpt_save_dir: './tmp_rec' # The training result (including checkpoints, per-epoch performance and curves) saving directory
dataset_sink_mode: False
dataset:
type: LMDBDataset
dataset_root: dir/to/data_lmdb_release/ # Root dir of training dataset
data_dir: training/ # Dir of training dataset, concatenated with `dataset_root` to be the complete dir of training dataset
# label_file: # Path of training label file, concatenated with `dataset_root` to be the complete path of training label file, not required when using LMDBDataset
...
eval:
ckpt_load_path: './tmp_rec/best.ckpt' # checkpoint file path
dataset_sink_mode: False
dataset:
type: LMDBDataset
dataset_root: dir/to/data_lmdb_release/ # Root dir of validation/evaluation dataset
data_dir: validation/ # Dir of validation/evaluation dataset, concatenated with `dataset_root` to be the complete dir of validation/evaluation dataset
# label_file: # Path of validation/evaluation label file, concatenated with `dataset_root` to be the complete path of validation/evaluation label file, not required when using LMDBDataset
...
loader:
shuffle: False
batch_size: 64 # Batch size for validation/evaluation
...
Notes:
- As the global batch size (batch_size x num_devices) is important for reproducing the result, please adjust
batch_size
accordingly to keep the global batch size unchanged for a different number of GPUs/NPUs, or adjust the learning rate linearly to a new global batch size.
- When performing Model Training, if
system.val_while_train
is True
, validation is performed while training. In this case, you should set eval.dataset.dataset_root
as root dir of validation dataset, set eval.dataset.data_dir
as dir of validation dataset (e.g., validation/
), and set eval.dataset.label_file
as path of validation label file (label_file is not required when using LMDBDataset).
- When performing Model Evaluation, you should set
eval.dataset.dataset_root
as root dir of evaluation dataset, and set eval.dataset.data_dir
as dir of evaluation dataset (e.g., evaluation/DATASET_NAME/
), and set eval.dataset.label_file
as path of evaluation label file (label_file is not required when using LMDBDataset).
3.2 Model Training
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please modify the configuration parameter distribute
as True and run
# distributed training on multiple GPU/Ascend devices
mpirun --allow-run-as-root -n 8 python tools/train.py --config configs/rec/crnn/crnn_resnet34.yaml
If you want to train or finetune the model on a smaller dataset without distributed training, please modify the configuration parameterdistribute
as False and run:
# standalone training on a CPU/GPU/Ascend device
python tools/train.py --config configs/rec/crnn/crnn_resnet34.yaml
The training result (including checkpoints, per-epoch performance and curves) will be saved in the directory parsed by the arg ckpt_save_dir
. The default directory is ./tmp_rec
.
3.3 Model Evaluation
To evaluate the accuracy of the trained model, you can use eval.py
. Please set the checkpoint path to the arg ckpt_load_path
in the eval
section of yaml config file, set distribute
to be False, and then run:
python tools/eval.py --config configs/rec/crnn/crnn_resnet34.yaml
References
[1] Baoguang Shi, Xiang Bai, Cong Yao. An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition. arXiv preprint arXiv:1507.05717, 2015.