Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
图灵的猫 a3cd3beaf4 | 1 year ago | |
---|---|---|
.. | ||
commons | 1 year ago | |
config | 1 year ago | |
data | 1 year ago | |
docs | 1 year ago | |
module | 1 year ago | |
serving | 1 year ago | |
tests | 1 year ago | |
.gitignore | 1 year ago | |
AlphaFold2中文版蛋白质预测模型使用指南(基于Deepmind与Mindspore开源框架).ipynb | 1 year ago | |
Fold_CN.ipynb | 1 year ago | |
Fold_CN.ipynb.txt | 1 year ago | |
LICENSE | 1 year ago | |
README.md | 1 year ago | |
main.py | 1 year ago | |
requirements.txt | 1 year ago |
中文版AlphaFold2开源模型复现-基于DeepMind&ColabFold&Mindspore:
Alphafold2提出了一个新的结构去同时嵌入MSA和残基-残基对的特征(pairwise features),新的输出表示去确保准确的端到端训练,以及新的辅助loss。此外,在finetune训练之前,AlphaFold2首先预训练了一把,在MSA上使用BERT任务遮盖住一些氨基酸再还原回来,此外还使用自蒸馏,自估计的loss去自监督学习——先用训好的模型在只有氨基酸序列的数据上生成预测结果,然后只保留高确信度的,然后使用这个数据预训练,在训练时把输入加上更强的drop out和mask,来增大学习难度,去预测完整信息时高确信度的结果。
结构由两部分组成,Evoformer和结构模块(Structure Module)。Evoformer输入MSA,模板,自己的氨基酸序列,输出MSA信息和残基-残基对关系(刚刚提到的pairwise features)建模。结构模块中,丢掉MSA中的其他氨基酸序列,只保留目标的那一条,然后再加上pairwise features,去计算更新backbone frames,预测所有氨基酸的方位和距离,肽键的长度和角度,氨基酸内部的扭转角度等。Evoformer即进化版Transformer,用来计算MSA和pairwise features。输入MSA和pairwise features,通过很多注意力层,最终输出MSA和pairwise features。
基于2021年谷歌DeepMind团队的AlphaFold2在多序列比对阶段,采用了MMseqs2进行序列检索,相比于原版算法端到端运算速度有2-3倍提升。
本代码运行基于Nvidia GPU/Tesla V100/Ascend处理器(三种都可以,参考华为之前的是Ascend系列为主)与MindSpore AI框架,当前版本需基于最新库上master代码(2021-11-08之后的代码)编译,
MindSpore环境参见MindSpore教程,环境安装后需要运行以下命令配置环境变量:
export MS_DEV_ENABLE_FALLBACK=0
其余python依赖请参见requirements.txt。
MMseqs2用于生成多序列比对(multiple sequence alignments,MSA),MMseqs2安装和使用可以参考MMseqs2 User Guide,安装完成后需要运行以下命令配置环境变量:
export PATH=$(pwd)/mmseqs/bin/:$PATH
数据处理参考colabfold。
推理结果保存在 ./result
中,共有两个文件,其中的pdb文件即为蛋白质结构预测结果,timings文件保存了运行过程中的时间信息和confidence信息。
{"pre_process_time": 418.57, "model_time": 122.86, "pos_process_time": 0.14, "all_time ": 541.56, "confidence ": 94.61789646019058}
参数 | Fold(Ascend) |
---|---|
模型版本 | AlphaFold |
资源 | Ascend 910 |
上传日期 | 2021-11-05 |
MindSpore版本 | master |
数据集 | CASP14 T1079 |
seq_length | 505 |
confidence | 94.62 |
TM-score | 98.01% |
运行时间 | 541.56s |
@misc{unpublished2021alphafold2,
title = {Alphafold2},
author = {John Jumper},
year = {2020},
archivePrefix = {arXiv},
primaryClass = {q-bio.BM}
}
@article{Rao2021.02.12.430858,
author = {Rao, Roshan and Liu, Jason and Verkuil, Robert and Meier, Joshua and Canny, John F. and Abbeel, Pieter and Sercu, Tom and Rives, Alexander},
title = {MSA Transformer},
year = {2021},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2021/02/13/2021.02.12.430858},
journal = {bioRxiv}
}
@article {Rives622803,
author = {Rives, Alexander and Goyal, Siddharth and Meier, Joshua and Guo, Demi and Ott, Myle and Zitnick, C. Lawrence and Ma, Jerry and Fergus, Rob},
title = {Biological Structure and Function Emerge from Scaling Unsupervised Learning to 250 Million Protein Sequences},
year = {2019},
doi = {10.1101/622803},
publisher = {Cold Spring Harbor Laboratory},
journal = {bioRxiv}
}
@article {Elnaggar2020.07.12.199554,
author = {Elnaggar, Ahmed and Heinzinger, Michael and Dallago, Christian and Rehawi, Ghalia and Wang, Yu and Jones, Llion and Gibbs, Tom and Feher, Tamas and Angerer, Christoph and Steinegger, Martin and BHOWMIK, DEBSINDHU and Rost, Burkhard},
title = {ProtTrans: Towards Cracking the Language of Life{\textquoteright}s Code Through Self-Supervised Deep Learning and High Performance Computing},
elocation-id = {2020.07.12.199554},
year = {2021},
doi = {10.1101/2020.07.12.199554},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2021/05/04/2020.07.12.199554},
eprint = {https://www.biorxiv.org/content/early/2021/05/04/2020.07.12.199554.full.pdf},
journal = {bioRxiv}
}
@misc{king2020sidechainnet,
title = {SidechainNet: An All-Atom Protein Structure Dataset for Machine Learning},
author = {Jonathan E. King and David Ryan Koes},
year = {2020},
eprint = {2010.08162},
archivePrefix = {arXiv},
primaryClass = {q-bio.BM}
}
@misc{alquraishi2019proteinnet,
title = {ProteinNet: a standardized data set for machine learning of protein structure},
author = {Mohammed AlQuraishi},
year = {2019},
eprint = {1902.00249},
archivePrefix = {arXiv},
primaryClass = {q-bio.BM}
}
@misc{gomez2017reversible,
title = {The Reversible Residual Network: Backpropagation Without Storing Activations},
author = {Aidan N. Gomez and Mengye Ren and Raquel Urtasun and Roger B. Grosse},
year = {2017},
eprint = {1707.04585},
archivePrefix = {arXiv},
primaryClass = {cs.CV}
}
@misc{fuchs2021iterative,
title = {Iterative SE(3)-Transformers},
author = {Fabian B. Fuchs and Edward Wagstaff and Justas Dauparas and Ingmar Posner},
year = {2021},
eprint = {2102.13419},
archivePrefix = {arXiv},
primaryClass = {cs.LG}
}
@misc{satorras2021en,
title = {E(n) Equivariant Graph Neural Networks},
author = {Victor Garcia Satorras and Emiel Hoogeboom and Max Welling},
year = {2021},
eprint = {2102.09844},
archivePrefix = {arXiv},
primaryClass = {cs.LG}
}
@misc{su2021roformer,
title = {RoFormer: Enhanced Transformer with Rotary Position Embedding},
author = {Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu},
year = {2021},
eprint = {2104.09864},
archivePrefix = {arXiv},
primaryClass = {cs.CL}
}
@article{Gao_2020,
title = {Kronecker Attention Networks},
ISBN = {9781450379984},
url = {http://dx.doi.org/10.1145/3394486.3403065},
DOI = {10.1145/3394486.3403065},
journal = {Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining},
publisher = {ACM},
author = {Gao, Hongyang and Wang, Zhengyang and Ji, Shuiwang},
year = {2020},
month = {Jul}
}
@article {Si2021.05.10.443415,
author = {Si, Yunda and Yan, Chengfei},
title = {Improved protein contact prediction using dimensional hybrid residual networks and singularity enhanced loss function},
elocation-id = {2021.05.10.443415},
year = {2021},
doi = {10.1101/2021.05.10.443415},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2021/05/11/2021.05.10.443415},
eprint = {https://www.biorxiv.org/content/early/2021/05/11/2021.05.10.443415.full.pdf},
journal = {bioRxiv}
}
@article {Costa2021.06.02.446809,
author = {Costa, Allan and Ponnapati, Manvitha and Jacobson, Joseph M. and Chatterjee, Pranam},
title = {Distillation of MSA Embeddings to Folded Protein Structures with Graph Transformers},
year = {2021},
doi = {10.1101/2021.06.02.446809},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2021/06/02/2021.06.02.446809},
eprint = {https://www.biorxiv.org/content/early/2021/06/02/2021.06.02.446809.full.pdf},
journal = {bioRxiv}
}
@article {Baek2021.06.14.448402,
author = {Baek, Minkyung and DiMaio, Frank and Anishchenko, Ivan and Dauparas, Justas and Ovchinnikov, Sergey and Lee, Gyu Rie and Wang, Jue and Cong, Qian and Kinch, Lisa N. and Schaeffer, R. Dustin and Mill{\'a}n, Claudia and Park, Hahnbeom and Adams, Carson and Glassman, Caleb R. and DeGiovanni, Andy and Pereira, Jose H. and Rodrigues, Andria V. and van Dijk, Alberdina A. and Ebrecht, Ana C. and Opperman, Diederik J. and Sagmeister, Theo and Buhlheller, Christoph and Pavkov-Keller, Tea and Rathinaswamy, Manoj K and Dalwadi, Udit and Yip, Calvin K and Burke, John E and Garcia, K. Christopher and Grishin, Nick V. and Adams, Paul D. and Read, Randy J. and Baker, David},
title = {Accurate prediction of protein structures and interactions using a 3-track network},
year = {2021},
doi = {10.1101/2021.06.14.448402},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2021/06/15/2021.06.14.448402},
eprint = {https://www.biorxiv.org/content/early/2021/06/15/2021.06.14.448402.full.pdf},
journal = {bioRxiv}
}
MindSpore community
Markdown Python Jupyter Notebook Text Diff other
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》