Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
Chunxiang Xu a01a32e2ca | 2 years ago | |
---|---|---|
.idea | 2 years ago | |
dict_dir | 2 years ago | |
examples | 2 years ago | |
fairseq | 2 years ago | |
images | 2 years ago | |
m2m_100 | 2 years ago | |
sh_dir | 2 years ago | |
user_dir | 2 years ago | |
README-zh.md | 2 years ago | |
README.md | 2 years ago | |
数据统计信息.xlsx | 2 years ago | |
通言-PPT.pptx | 2 years ago |
PCL-tonglian is a multi-language machine translation model. The single model supports 17 minority languages translation with Chinese, it also supports translation between any two languages. PCL-Tongyan is a multilingual machine translation model improved on the structure of M2M-100 model. Through parameter reusing and incremental training, the model parameters are increased from 1.2B to 13.2B, which greatly improves the translation performance of multiple minority languages. We use a lifelong learning approach based on dynamic playback, PCL-Tongyan can continuously learn new language translation without forgetting old languages. More details are given in the PPT.
https://git.pcl.ac.cn/PCMachineTranslation/PCMT/src/branch/master/datasets
-- See Excel for data statistics
Switch from normal model to MOE model
python Change_1.2B_To_16Moe_Version.py
Convert distributed MOE model to single card deployment
python Comerge_16To1.py
Fine-tuning multilingual translation task
bash sh_dir/Train-16moe-SiLu-Inhert.sh 16 GShardGate 2
Test bleu on xx->zh and zh->xx direction
bash sh_dir/Test-16Moe-multi-silu.sh 0 xx
Data processing
bash sh_dir/process.sh
import requests
def Tongyan_Translate(sentences=None,direction=None,PyTorch_REST_API_URL = 'http://192.168.202.124:5000/predict'):
c_lgs=['中文(zh)','意大利语(it)','德语(de)','捷克语(cs)','荷兰语(nl)','葡萄牙语(pt)','印尼语(id)','保加利亚语(bg)','波斯尼亚(bs)',
'波斯尼亚(bs)','希腊语(el)','波斯语(fa)','克罗地亚语(hr)','匈牙利语(hu)','爱沙尼亚语(et)','希伯来语(he)',
'斯洛文尼亚(sl)','波兰语(pl)','土耳其语(tr)','乌尔都语(ur)']
lgs=['zh','it','de','cs','nl','pt','id','bg','bs','bs','el','fa','hr','hu','et','he','sl','pl','tr','ur']
src,tgt=direction.split("-")
if src not in lgs or tgt not in lgs:
print(f"参数<direction>请在下面集合中的语言按照xx-xx的格式输入: \n{','.join(c_lgs)}")
return None
else:
payload = {'data': [direction,sentences]}
# Submit the request.
r = requests.post(PyTorch_REST_API_URL, data=payload).json()
if r['success']:
translations=[sent for sent in enumerate(r['predictions'])]
return translations
else:
return None
if __name__ == '__main__':
sentences = [
"I want to eat an apple ",
"Today is a fine day! ",
"Hello, I am THE senior engineer OF PCL XXX, please give me your advice!"
]
direction = "zh-pt"
res=Tongyan_Translate(sentences=sentences,direction=direction)
print(res)
fairseq 1.0.0a0+2fd9d8a
fastmoe 0.2.0
鹏程-通言模型 通言模型是在M2M-100模型结构上进行改进的多语种机器翻译模型,通过参数复用和增量式训练,将模型参数从1.2B提升至13.2B,在一带一路多个小语种的翻译上大幅提升。
Text Python C++ Cuda other
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》