One year after the AlphaFold protein structure database was made freely available to the public, last week it refreshed our expectations again: 214 million protein structures have been predicted for more than 1 million species, covering almost all known proteins on Earth.
The three-dimensional structures of proteins in this database update cover a wide range of categories including plants, bacteria, animals and other microorganisms, and can be downloaded through the Google Cloud public dataset.
Of the 214 million predicted protein structures, about 35% have achieved the structural accuracy obtained by experimental means, and 80% of the structures are reliable enough for multiple subsequent analyses.
And, the data will continue to be freely available to the public, says DeepMind CEO Dr. Hassabis, “It’s our gift to humanity.”
The replay of the warm scene when AlphaFold 2 was born has once again sparked heated discussions on social media at home and abroad.
As “insiders”, researchers in the field of life sciences, how do they view the results achieved by AlphaFold this time?
Xu Dong, Shumaker Chair Professor at the University of Missouri-Columbia, introduced to Leifeng.com’s “Medical Health AI Nuggets” that DeepMind still uses the previous AlphaFold tool in the results released this time, and there is no major technical innovation.
But its predicted 214 million protein structures will play a very important role. With the help of these protein structures, many questions in the field of biology can be answered from a new perspective.
Professor Xu Dong is a member of AAAS and AIMBE. He was awarded the 2001 American “Top 100 Research and Development Award (International 2001 R&D 100 Award)” for his work on protein structure prediction.
Since 1997, Professor Xu Dong has started research on protein structure prediction.
“We used to make protein structure predictions only through sequence comparison. At that time, most protein structures had not yet been found, and the prediction accuracy was not high. After the appearance of AlphaFold, the related research on protein structure prediction can be carried out again. A new level.”
By mining the discovered data of more than 200 million protein structures to observe the distribution of the overall protein folding, we can more clearly understand the evolution, function and distribution of proteins.
However, can all the protein structure data of more than 200 million be used in research?
Professor Zhou Yaoqi, deputy director of the Institute of Systems and Physical Biology of the Shenzhen Bay Laboratory, has also conducted research on protein structure prediction for many years.
Before the advent of AlphaFold, he and his team developed a neural network regression method to predict the true dihedral angle of proteins, providing the basis for end-to-end protein structure prediction.
Zhou Yaoqi pointed out the hidden problems behind the massive data released this time: Although the data in the AlphaFold protein structure database is huge, some of the proteins have few homologous sequences, so AlphaFold cannot accurately predict them, and more needs to be added. evolutionary information.
In addition, some proteins are structurally unstable and need to be stabilized by combining with other molecules, and their structures are difficult to predict accurately.
“AlphaFold uses the confidence metric pLDDT to describe the confidence of individual amino acids within the structure. When the pLDDT index is too low, the protein structure is unusable.”
Xu Dong also pointed out that some of the protein structures predicted by AlphaFold are unstable and cannot be used in research;
In addition, when the two structures vary slightly in sequence, such as when one or two amino acids are mutated in a protein, AlphaFold can’t tell the difference.
Professor Pan Yi, dean of the School of Computer Science and Control Engineering at Shenzhen University of Technology (under preparation), Chinese Academy of Sciences, has similar concerns.
He, who comes from a computer background, said, “Artificial intelligence has a learning process, and its accuracy needs to be improved through a lot of training. If the protein structure predicted by AlphaFold is an uncommon structure, AI cannot learn this structure through existing knowledge. , it is prone to bias when forecasting.”
Pan Yi introduced to “Medical Health AI Nuggets” that AI is a tool that can use existing knowledge to predict the future. If even the existing knowledge is missing, it is naturally impossible to predict the new structure.
“Until all protein structures in the world have been predicted and verified, it is impossible to achieve 100 percent accuracy.”
Although the prediction of some protein structures is not completely accurate, the AlphaFold protein structure database also provides an accuracy report of the corresponding structure prediction while opening the data to provide users with a reference.
The impact of the large number of protein structures on life science research is still unquestionable, especially in the field of structural biology.
“The predicted protein structure can better help researchers understand the function of human proteins,” said Tang Jian, a professor at the MILA Laboratory at the University of Montreal in Canada, “but the impact on drug development is limited.”
Tang Jian is now concentrating on the application of graph representation learning in new drug development.
Pan Yi has a more positive view on the role AlphaFold will bring to the pharmaceutical industry.
He told “Medical Health AI Nuggets” that the protein structure predicted by AlphaFold will be of great help to biopharmaceuticals, especially in the work of small molecule screening.
Since returning to China in 2020, Pan Yi’s research has gradually shifted from theory to application, and drug research and development is also one of the key directions of his research.
He believes that these predicted protein structures will save a lot of energy and money for researchers in the field of life sciences, who can directly search for the corresponding structures from the database for research without having to analyze them by themselves.
To sum up, although the structures in the AlphaFold protein structure database have shortcomings and cannot all be used in research, the huge number of protein structures still has significance for research in various fields of life sciences.
Although it was only four years old, AlphaFold’s impact on protein structure prediction was almost earth-shaking.
In 2016, after the AlphaGo developed by DeepMind defeated the legendary Korean Go player Lee Sedol, its advancement and potential were recognized, and DeepMind decided to set up a team to study the “protein folding problem”.
On December 2, 2018, AlphaFold was born and predicted the most accurate structures of 25 of the 43 proteins in the 13th International Competition for Protein Structure Prediction (CASP13), beating other contestants to the first place ( under entry A7D), its research team expanded again and started working on innovative new systems.
Two years later, on November 30, 2020, DeepMind competed again with AlphaFold2 and won the first prize in CASP14. The predicted structure reached atomic precision, and the median error (RMSD_95) was less than 1 angstrom, which was 3 times more accurate than the second-best system, and was comparable to the experimental results. method is comparable.
The organizers of CASP have said that AlphaFold2 has cracked a major 50-year-old “problem of protein folding”.
On July 15, 2021, DeepMind open sourced its AlphaFold2 model based on a deep learning neural network through a Nature paper;
A week later, on July 22, DeepMind published the Nature paper again, launched the AlphaFold protein structure database, and made more than 350,000 structures of the human proteome and another 20 model organisms freely available to the public, and 98.5% of the human protein structures were analyzed. Accurate predictions.
Prior to this, the protein structure solved by the scientific community only covered 17% of the amino acids of the human protein sequence.
A year later, AlphaFold has caused a sensation again. How much impact will it have on the research process in the field of bioinformatics? Leifeng.com
This article is reproduced from: https://www.leiphone.com/category/shengwuyiyao/xJ4iS0jJRDpAuBgX.html
This site is for inclusion only, and the copyright belongs to the original author.