Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
FangShancheng 789743c56e | 1 year ago | |
---|---|---|
configs | 2 years ago | |
data | 1 year ago | |
docker | 2 years ago | |
figs | 2 years ago | |
modules | 1 year ago | |
notebooks | 2 years ago | |
tools | 1 year ago | |
.gitignore | 2 years ago | |
LICENSE | 2 years ago | |
README.md | 1 year ago | |
callbacks.py | 2 years ago | |
dataset.py | 1 year ago | |
demo.py | 2 years ago | |
demo_onnx.py | 1 year ago | |
export_onnx.py | 1 year ago | |
losses.py | 2 years ago | |
main.py | 2 years ago | |
requirements.txt | 2 years ago | |
transforms.py | 2 years ago | |
utils.py | 2 years ago |
The official code of ABINet (CVPR 2021, Oral).
ABINet uses a vision model and an explicit language model to recognize text in the wild, which are trained in end-to-end way. The language model (BCN) achieves bidirectional language representation in simulating cloze test, additionally utilizing iterative correction strategy.
We provide a pre-built docker image using the Dockerfile from docker/Dockerfile
Running in Docker
$ git@github.com:FangShancheng/ABINet.git
$ docker run --gpus all --rm -ti --ipc=host -v "$(pwd)"/ABINet:/app fangshancheng/fastai:torch1.1 /bin/bash
(Untested) Or using the dependencies
pip install -r requirements.txt
Training datasets
tools/create_lmdb_dataset.py
to convert images into LMDB datasettools/crop_by_word_bb.py
to crop images from original SynthText dataset, and convert images into LMDB dataset by tools/create_lmdb_dataset.py
notebooks/prepare_wikitext103.ipynb
to convert text into CSV format.Evaluation datasets, LMDB datasets can be downloaded from BaiduNetdisk(passwd:1dbv), GoogleDrive.
The structure of data
directory is
data
├── charset_36.txt
├── evaluation
│ ├── CUTE80
│ ├── IC13_857
│ ├── IC15_1811
│ ├── IIIT5k_3000
│ ├── SVT
│ └── SVTP
├── training
│ ├── MJ
│ │ ├── MJ_test
│ │ ├── MJ_train
│ │ └── MJ_valid
│ └── ST
├── WikiText-103.csv
└── WikiText-103_eval_d1.csv
Get the pretrained models from BaiduNetdisk(passwd:kwck), GoogleDrive. Performances of the pretrained models are summaried as follows:
Model | IC13 | SVT | IIIT | IC15 | SVTP | CUTE | AVG |
---|---|---|---|---|---|---|---|
ABINet-SV | 97.1 | 92.7 | 95.2 | 84.0 | 86.7 | 88.5 | 91.4 |
ABINet-LV | 97.0 | 93.4 | 96.4 | 85.9 | 89.5 | 89.2 | 92.7 |
CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config=configs/pretrain_vision_model.yaml
CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config=configs/pretrain_language_model.yaml
CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config=configs/train_abinet.yaml
Note:
checkpoint
path for vision and language models separately for specific pretrained model, or set to None
to train from scratchCUDA_VISIBLE_DEVICES=0 python main.py --config=configs/train_abinet.yaml --phase test --image_only
Additional flags:
--checkpoint /path/to/checkpoint
set the path of evaluation model--test_root /path/to/dataset
set the path of evaluation dataset--model_eval [alignment|vision]
which sub-model to evaluate--image_only
disable dumping visualization of attention masksIntegrated into Huggingface Spaces 🤗 using Gradio. Try out the Web Demo:
python demo.py --config=configs/train_abinet.yaml --input=figs/test
Additional flags:
--config /path/to/config
set the path of configuration file--input /path/to/image-directory
set the path of image directory or wildcard path, e.g, --input='figs/test/*.png'
--checkpoint /path/to/checkpoint
set the path of trained model--cuda [-1|0|1|2|3...]
set the cuda id, by default -1 is set and stands for cpu--model_eval [alignment|vision]
which sub-model to use--image_only
disable dumping visualization of attention masksSuccessful and failure cases on low-quality images:
If you find our method useful for your reserach, please cite
@article{fang2021read,
title={Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition},
author={Fang, Shancheng and Xie, Hongtao and Wang, Yuxin and Mao, Zhendong and Zhang, Yongdong},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2021}
}
This project is only free for academic research purposes, licensed under the 2-clause BSD License - see the LICENSE file for details.
Feel free to contact fangsc@ustc.edu.cn if you have any questions.
No Description
Jupyter Notebook Python other
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》