Chinese-Text-Recognition
Data Downloading
Following the setup in Benchmarking-Chinese-Text-Recognition, we use the same training, validation and evaliation data as described in Datasets section.
Please download the following LMDB files as introduced in Downloads section:
Data Structure
After downloading the files, please put all training files under the same folder training
, all validation data under validation
folder, and all evaluation data under evaluation
.
The data structure should be like:
chinese-text-recognition/
├── evaluation
│ ├── document_test
| | ├── data.mdb
| │ └── lock.mdb
│ ├── handwriting_test
| | ├── data.mdb
| │ └── lock.mdb
│ ├── scene_test
| | ├── data.mdb
| │ └── lock.mdb
│ └── web_test
| ├── data.mdb
| └── lock.mdb
├── training
│ ├── document_train
| | ├── data.mdb
| │ └── lock.mdb
│ ├── handwriting_train
| | ├── data.mdb
| │ └── lock.mdb
│ ├── scene_train
| | ├── data.mdb
| │ └── lock.mdb
│ └── web_train
| ├── data.mdb
| └── lock.mdb
└── validation
├── document_val
| ├── data.mdb
│ └── lock.mdb
├── handwriting_val
| ├── data.mdb
│ └── lock.mdb
├── scene_val
| ├── data.mdb
│ └── lock.mdb
└── web_val
├── data.mdb
└── lock.mdb
Data Configuration
To use the datasets, you can specify the datasets as follow in configuration file.
Model Training
...
train:
...
dataset:
type: LMDBDataset
dataset_root: dir/to/chinese-text-recognition/ # Root dir of training dataset
data_dir: training/ # Dir of training dataset, concatenated with `dataset_root` to be the complete dir of training dataset
...
eval:
dataset:
type: LMDBDataset
dataset_root: dir/to/chinese-text-recognition/ # Root dir of validation dataset
data_dir: validation/ # Dir of validation dataset, concatenated with `dataset_root` to be the complete dir of validation dataset
...
Model Evaluation
...
train:
# NO NEED TO CHANGE ANYTHING IN TRAIN SINCE IT IS NOT USED
...
eval:
dataset:
type: LMDBDataset
dataset_root: dir/to/chinese-text-recognition/ # Root dir of evaluation dataset
data_dir: evaluation/ # Dir of evaluation dataset, concatenated with `dataset_root` to be the complete dir of evaluation dataset
...
Back to dataset converters