Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
shibing624 8c345e9661 | 6 months ago | |
---|---|---|
.. | ||
README.md | 6 months ago | |
demo.py | 6 months ago | |
predict.py | 6 months ago | |
train.py | 6 months ago |
torch
transformers
datasets
loguru
example: examples/t5/demo.py
from pycorrector import T5Corrector
m = T5Corrector()
print(m.correct_batch(['今天新情很好', '你找到你最喜欢的工作,我也很高心。']))
output:
[{'source': '今天新情很好', 'target': '今天心情很好', 'errors': [('新', '心', 2)]},
{'source': '你找到你最喜欢的工作,我也很高心。', 'target': '你找到你最喜欢的工作,我也很高兴。', 'errors': [('心', '兴', 15)]}]
sighan 2015中文拼写纠错数据(2k条):examples/data/sighan_2015/train.tsv
data format:
你说的是对,跟那些失业的人比起来你也算是辛运的。 你说的是对,跟那些失业的人比起来你也算是幸运的。
数据集 | 语料 | 下载链接 | 压缩包大小 |
---|---|---|---|
SIGHAN+Wang271K中文纠错数据集 |
SIGHAN+Wang271K(27万条) | 百度网盘(密码01b9) | 106M |
下载SIGHAN+Wang271K中文纠错数据集
,解压后,为json格式。
run train:
python train.py --do_train --do_eval --model_name_or_path output/mengzi-t5-base-chinese-correction/ --train_path ./output/train.json --test_path output/test.json
python predict.py
output:
[{'source': '今天新情很好', 'target': '今天心情很好', 'errors': [('新', '心', 2)]},
{'source': '你找到你最喜欢的工作,我也很高心。', 'target': '你找到你最喜欢的工作,我也很高兴。', 'errors': [('心', '兴', 15)]}]
基于SIGHAN+Wang271K中文纠错数据集
训练的T5模型,已经release到HuggingFace models:shibing624/mengzi-t5-base-chinese-correction
评估数据集:SIGHAN2015测试集
GPU:Tesla V100,显存 32 GB
模型 | Backbone | GPU | Precision | Recall | F1 | QPS |
---|---|---|---|---|---|---|
T5 | byt5-small | GPU | 0.5220 | 0.3941 | 0.4491 | 111 |
mengzi-t5-base-chinese-correction | mengzi-t5-base | GPU | 0.8321 | 0.6390 | 0.7229 | 214 |
Text Python other
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》