Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
shibing624 dd213edba8 | 5 months ago | |
---|---|---|
.. | ||
README.md | 6 months ago | |
demo.py | 6 months ago | |
training_chatglm_demo.py | 5 months ago | |
training_llama_demo.py | 5 months ago | |
use_origin_transformers_demo.py | 6 months ago |
中文文本纠错任务是一项NLP基础任务,其输入是一个可能含有语法错误的中文句子,输出是一个正确的中文句子。
语法错误类型很多,有多字、少字、错别字等,目前最常见的错误类型是错别字
。大部分研究工作围绕错别字这一类型进行研究。
本项目基于LLaMA实现了中文拼写纠错和语法纠错。
运行命令:
pip install transformers peft -U
example: examples/gpt/demo.py
from pycorrector import GptCorrector
m = GptCorrector()
print(m.correct_batch(['今天新情很好', '你找到你最喜欢的工作,我也很高心。']))
output:
[{'source': '今天新情很好', 'target': '今天心情很好', 'errors': [('新', '心', 2)]},
{'source': '你找到你最喜欢的工作,我也很高心。', 'target': '你找到你最喜欢的工作,我也很高兴。', 'errors': [('心', '兴', 15)]}]
中文语法纠错数据(1k条):examples/data/grammar/train_sharegpt.jsonl
data format:
{"conversations":[{"from":"human","value":"对这个句子语法纠错\n\n这件事对我们大家当时震动很大。"},{"from":"gpt","value":"这件事当时对我们大家震动很大。"}]}
run train:
cd examples/gpt
python train_chatglm_demo.py --do_train --do_predict
output:
input : 这块名表带带相传
predict: 这块名表代代相传
Text Python other
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》