The ICPR MTWI-2018 dataset Official Website | Download Link
Note: Please register an account to download this dataset.
The ICPR MTWI dataset has derived three tasks: Text Line(column) Recognition of Web Images, Text Detection of Web Images, and End to End Text Detection and Recognition of Web Images. The three tasks share the same training data: mtwi_train.zip
; For test data, task1 has test data: mtwi_task1.zip
, and task2/3 share the same test data: mtwi_task2_3.zip
. For now, we will consider and download only the training data mtw_train.zip
.
After downloading the dataset, unzip the file, after which the directory structure should be like as follows (ignoring the archive files):
MTWI-2018
|--- image_train
| |--- <image_name>.jpg
| |--- <image_name>.jpg
| |--- ...
|--- txt_train
| |--- <image_name>.txt
| |--- <image_name>.txt
| |--- ...
To prepare the data for text detection, you can run the following commands:
python tools/dataset_converters/convert.py \
--dataset_name mtwi2018 --task det \
--image_dir path/to/MTWI-2018/image_train/ \
--label_dir path/to/MTWI-2018/txt_train.json \
--output_path path/to/MTWI-2018/det_gt.txt
The generated standard annotation file det_gt.txt
will now be placed under the folder MTWI-2018/
.
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》