Following the setup in Benchmarking-Chinese-Text-Recognition, we use the same training, validation and evaliation data as described in Datasets section.
Please download the following LMDB files as introduced in Downloads section:
After downloading the files, please put all training files under the same folder training
, all validation data under validation
folder, and all evaluation data under evaluation
.
The data structure should be like:
chinese-text-recognition/
├── evaluation
│ ├── document_test
| | ├── data.mdb
| │ └── lock.mdb
│ ├── handwriting_test
| | ├── data.mdb
| │ └── lock.mdb
│ ├── scene_test
| | ├── data.mdb
| │ └── lock.mdb
│ └── web_test
| ├── data.mdb
| └── lock.mdb
├── training
│ ├── document_train
| | ├── data.mdb
| │ └── lock.mdb
│ ├── handwriting_train
| | ├── data.mdb
| │ └── lock.mdb
│ ├── scene_train
| | ├── data.mdb
| │ └── lock.mdb
│ └── web_train
| ├── data.mdb
| └── lock.mdb
└── validation
├── document_val
| ├── data.mdb
│ └── lock.mdb
├── handwriting_val
| ├── data.mdb
│ └── lock.mdb
├── scene_val
| ├── data.mdb
│ └── lock.mdb
└── web_val
├── data.mdb
└── lock.mdb
To use the datasets, you can specify the datasets as follow in configuration file.
...
train:
...
dataset:
type: LMDBDataset
dataset_root: dir/to/chinese-text-recognition/ # Root dir of training dataset
data_dir: training/ # Dir of training dataset, concatenated with `dataset_root` to be the complete dir of training dataset
...
eval:
dataset:
type: LMDBDataset
dataset_root: dir/to/chinese-text-recognition/ # Root dir of validation dataset
data_dir: validation/ # Dir of validation dataset, concatenated with `dataset_root` to be the complete dir of validation dataset
...
...
train:
# NO NEED TO CHANGE ANYTHING IN TRAIN SINCE IT IS NOT USED
...
eval:
dataset:
type: LMDBDataset
dataset_root: dir/to/chinese-text-recognition/ # Root dir of evaluation dataset
data_dir: evaluation/ # Dir of evaluation dataset, concatenated with `dataset_root` to be the complete dir of evaluation dataset
...
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》