You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
tal e5ce6d9b1c add MSSHEM 1 year ago
data add MSSHEM 1 year ago
src add MSSHEM 1 year ago
3500_frequent_chars.txt add MSSHEM 1 year ago
README.md add MSSHEM 1 year ago
synonym.json add MSSHEM 1 year ago

该算法提出了一个新颖的、鲁棒的训练框架,可以提升模型在基于 OCR 识别文本的下游任务表现。该框架:1)采用简单而有效的方法,从干净的文本中直接模拟自然的 OCR 噪音,2)从大量的模拟样本中反复挖掘难样本,以获得最佳性能。3)采用稳定性损失,使模型学习到不受噪音表征的影响。在 Metaphor 数据集上 accuracy 达到 87.7%,F1 达到 87.1%。

CSV Text Python

Contributors (2)