Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
tal e5ce6d9b1c | 1 year ago | |
---|---|---|
data | 1 year ago | |
src | 1 year ago | |
3500_frequent_chars.txt | 1 year ago | |
README.md | 1 year ago | |
synonym.json | 1 year ago |
Source code and data for Robust Learning for Text Classification with Multisource Noise Simulation and Hard Example Mining.
step1: run scripts in noise_simulation folder to generate simultaed samples
step2: python mining.py with the correct data path
step3: python inference.py
pytorch>=1.5.0
python>=3.5.5
Numpy
该算法提出了一个新颖的、鲁棒的训练框架,可以提升模型在基于 OCR 识别文本的下游任务表现。该框架:1)采用简单而有效的方法,从干净的文本中直接模拟自然的 OCR 噪音,2)从大量的模拟样本中反复挖掘难样本,以获得最佳性能。3)采用稳定性损失,使模型学习到不受噪音表征的影响。在 Metaphor 数据集上 accuracy 达到 87.7%,F1 达到 87.1%。
CSV Text Python
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》