研究人員利用人工智能技術取得了巨大突破,可能為新藥研發帶來革命。
科學家開發的一款人工智能軟件,利用蛋白質的DNA序列預測其三維結構,準確度誤差不超過一個原子的寬度。
這項成就解決了困擾分子生物學領域50年的挑戰。它來自于倫敦人工智能公司DeepMind的研究團隊。目前,DeepMind隸屬于谷歌(Google)母公司Alphabet旗下。到目前為止,DeepMind最為人所知的是其創造的人工智能在圍棋比賽中打敗了人類選手,創下了計算機科學領域的一個重要里程碑。
DeepMind在兩年一次預測蛋白質結構的算法競賽中取得了該項突破。該競賽要求參賽者根據一個蛋白質的DNA序列,確定該蛋白質的三維形狀。
馬里蘭大學(University of Maryland)的分子生物學家約翰·莫爾特是“結構預測關鍵評估”(Critical Assessment of Structure Prediction)競賽的負責人。他表示,在100多種蛋白質中,DeepMind的人工智能軟件AlphaFold 2預測蛋白質結構的準確度,有三分之二的偏差在一個原子寬度以內,剩余三分之一大部分的預測結果也非常準確。他表示,AlphaFold 2的準確度遠高于參加競賽的任何其他方法。
DeepMind的聯合創始人及首席執行官德米斯·哈薩比斯表示,公司希望“利用這些技術最大程度造福社會?!钡硎?,DeepMind尚未確定通過哪種方式將該蛋白質結構預測軟件提供給學術研究人員使用,或者是否向制藥公司和生物科技公司尋求商業合作。他說公司將在明年某個時間“詳細說明我們如何以能夠規模化的方式提供該系統?!?/p>
結構生物學家、諾貝爾獎得主文卡特拉曼·拉馬克里希南評價AlphaFold 2稱:“這款軟件的計算結果代表蛋白質折疊問題取得了驚人的進步。”拉馬克里希南是英國最負盛名的科研機構皇家學會(Royal Society)的會長,即將卸任。
蛋白質結構專家、歐洲分子生物學實驗室(European Molecular Biology Laboratory)歐洲生物信息研究所(European Bioinformatics Institute)的前負責人珍妮特·桑頓表示,DeepMind的突破為繪制完整的“人類蛋白質組圖譜”開辟了道路。人類蛋白質組圖譜中將包含人體內的所有蛋白質。她表示,目前只有約四分之一的人類蛋白質被用作藥物靶點?,F在可以將更多蛋白質作為藥物靶點,為發明新藥創造了巨大的機會。
桑頓還表示,DeepMind的人工智能系統對于研究合成蛋白質的科學家同樣有著深遠的意義,也可能產生巨大的影響:例如培養更有營養的新型轉基因作物品種,開發能夠通過消化塑料來清潔環境的新型酶等。
蛋白質是生物學過程的基本機制。蛋白質由氨基酸長鏈組成,氨基酸長鏈又稱DNA。但細胞生成蛋白質之后,蛋白質會立即折疊成復雜的形狀,類似于一團繩子纏繞在一起,有條狀結構和類似于花飾的附著結構。蛋白質的具體結構決定了它的功能。蛋白質結構對于小分子設計也至關重要。小分子可以與蛋白質結合,并修改蛋白質的功能,這就是新藥研發的過程。
到目前為止,為獲取一種蛋白質結構的高分辨率模型,使用的主要方法是X射線晶體學。這種技術能夠將一種蛋白質溶液變成晶體,這個過程極其復雜并且要耗費大量時間。然后用X射線連續照射晶體,通常會使用一種名為同步加速器的環形粒子加速器。研究人員可以通過X射線的繞射圖繪制出蛋白質的內部結構圖。據多倫多大學(University of Toronto)估計,通過X射線晶體學這種方法獲取一個蛋白質的結構,需要耗時一年,成本約為12萬美元。
最近,還有兩種實驗方法也被用于預測蛋白質結構,它們分別是核磁共振和低溫電子顯微技術。這兩種方法的速度更快,成本更低,但其生成的模型精確度不及X射線晶體學。
而按照DeepMind蛋白質折疊團隊的首席研究員約翰·江珀的說法,AlphaFold 2使用“適度的”計算資源,只需要“幾天時間”就能夠計算出蛋白質的每一種結構。江珀表示,訓練該系統需要在16個芯片上使用由谷歌開發的128個專用人工智能計算單元,連續運行“大約幾周”。這種人工智能計算單元被稱為張量處理單元。他表示,該系統需要的計算能力,比公司最近的多項人工智能突破要少得多,包括之前的AlphaGo。
1972年,諾貝爾獎得主、化學家克里斯蒂安·安芬森曾經假設,DNA應該完全能夠決定蛋白質的最終結構。為了證明安芬森提出的設想,科學界數十年來一直在尋找數學模型。但問題是,即使物理定律可以決定蛋白質的折疊方式,蛋白質折疊可能存在大量其他排列,因此正如生物學家賽勒斯·利文索爾提出的一種著名的說法,通過隨機試錯法確定一個蛋白質的結構所需要的時間,可能比已知宇宙的年齡更長。
但DeepMind的AlphaFold 2現在已經基本實現了安芬森的設想。莫爾特表示,在“結構預測關鍵評估”競賽中,對于超過三分之二的蛋白質,AlphaFold 2和X射線晶體學的準確度不相上下?,F在希望研究人員能夠利用AlphaFold 2,或者至少用相同的方法,直接根據蛋白質的DNA序列得出其3D形狀,不需要使用X射線晶體學或其他物理實驗。獲取蛋白質的DNA序列相對容易,并且成本低廉。
位于德國蒂賓根的馬克斯·普朗克發育生物學研究所(Max Planck Institute for Developmental Biology)的蛋白質進化系主任安德烈·盧帕斯是今年“結構預測關鍵評估”競賽的評審之一。他說DeepMind的結果“令人震驚?!?/p>
在“結構預測關鍵評估”競賽過程中,為了驗證DeepMind系統的能力,盧帕斯利用AlphaFold 2的預測結果,以確認它能否預測出一種蛋白質結構的最后一部分。10多年來,他利用X射線晶體學一直無法完成這部分結構的繪制。盧帕斯說利用AlphaFold 2生成的預測,他可以在短短半個小時內確定最后一個蛋白質區段的形狀。
AlphaFold 2已經被用于準確預測一種名為ORF3a的蛋白質的結構,這種蛋白質存在于導致新冠肺炎的SARS-CoV-2病毒當中。未來,科學家能夠根據其預測的結果,將這種蛋白質作為靶點,開發治療藥物。
盧帕斯表示,他認為對于從事蛋白質研究的科學家而言,這款人工智能軟件將“帶來顛覆性的變化”。目前已知約2億種蛋白質的DNA序列,并且每年可以發現數千萬個新的蛋白質。但已經繪制出3D結構的蛋白質不足20萬種。
AlphaFold 2是唯一一款專門用于預測單個蛋白質結構的人工智能。但蛋白質的性質決定了一種蛋白質通常會與其他蛋白質組成復雜的結構。江珀表示,下一步的目標是開發一種能夠預測蛋白質之間的復雜動態的人工智能系統,例如蛋白質之間如何結合,或者相鄰的蛋白質如何改變彼此的形狀等。
DeepMind兩年前參加了“結構預測關鍵評估”競賽并獲獎。但當時所使用的人工智能系統AlphaFold配置不同,在最難預測的一類蛋白質中,其平均“全局距離完全測試得分”(global distance test total score)只有58分。全局距離完全測試得分相當于其準確繪制的每一種蛋白質的百分比。
雖然這個分數比第二名的團隊高了約6分,但無法與X射線晶體學等實證研究方法相媲美。今年,即使是最難預測的蛋白質,DeepMind的全局距離完全測試得分中位數也達到了87分,接近于X射線晶體學的分數,比緊隨其后的團隊高出約26分。(財富中文網)
翻譯:劉進龍
審校:汪皓
研究人員利用人工智能技術取得了巨大突破,可能為新藥研發帶來革命。
科學家開發的一款人工智能軟件,利用蛋白質的DNA序列預測其三維結構,準確度誤差不超過一個原子的寬度。
這項成就解決了困擾分子生物學領域50年的挑戰。它來自于倫敦人工智能公司DeepMind的研究團隊。目前,DeepMind隸屬于谷歌(Google)母公司Alphabet旗下。到目前為止,DeepMind最為人所知的是其創造的人工智能在圍棋比賽中打敗了人類選手,創下了計算機科學領域的一個重要里程碑。
DeepMind在兩年一次預測蛋白質結構的算法競賽中取得了該項突破。該競賽要求參賽者根據一個蛋白質的DNA序列,確定該蛋白質的三維形狀。
馬里蘭大學(University of Maryland)的分子生物學家約翰·莫爾特是“結構預測關鍵評估”(Critical Assessment of Structure Prediction)競賽的負責人。他表示,在100多種蛋白質中,DeepMind的人工智能軟件AlphaFold 2預測蛋白質結構的準確度,有三分之二的偏差在一個原子寬度以內,剩余三分之一大部分的預測結果也非常準確。他表示,AlphaFold 2的準確度遠高于參加競賽的任何其他方法。
DeepMind的聯合創始人及首席執行官德米斯·哈薩比斯表示,公司希望“利用這些技術最大程度造福社會。”但他表示,DeepMind尚未確定通過哪種方式將該蛋白質結構預測軟件提供給學術研究人員使用,或者是否向制藥公司和生物科技公司尋求商業合作。他說公司將在明年某個時間“詳細說明我們如何以能夠規?;姆绞教峁┰撓到y?!?/p>
結構生物學家、諾貝爾獎得主文卡特拉曼·拉馬克里希南評價AlphaFold 2稱:“這款軟件的計算結果代表蛋白質折疊問題取得了驚人的進步?!崩R克里希南是英國最負盛名的科研機構皇家學會(Royal Society)的會長,即將卸任。
蛋白質結構專家、歐洲分子生物學實驗室(European Molecular Biology Laboratory)歐洲生物信息研究所(European Bioinformatics Institute)的前負責人珍妮特·桑頓表示,DeepMind的突破為繪制完整的“人類蛋白質組圖譜”開辟了道路。人類蛋白質組圖譜中將包含人體內的所有蛋白質。她表示,目前只有約四分之一的人類蛋白質被用作藥物靶點?,F在可以將更多蛋白質作為藥物靶點,為發明新藥創造了巨大的機會。
桑頓還表示,DeepMind的人工智能系統對于研究合成蛋白質的科學家同樣有著深遠的意義,也可能產生巨大的影響:例如培養更有營養的新型轉基因作物品種,開發能夠通過消化塑料來清潔環境的新型酶等。
蛋白質是生物學過程的基本機制。蛋白質由氨基酸長鏈組成,氨基酸長鏈又稱DNA。但細胞生成蛋白質之后,蛋白質會立即折疊成復雜的形狀,類似于一團繩子纏繞在一起,有條狀結構和類似于花飾的附著結構。蛋白質的具體結構決定了它的功能。蛋白質結構對于小分子設計也至關重要。小分子可以與蛋白質結合,并修改蛋白質的功能,這就是新藥研發的過程。
到目前為止,為獲取一種蛋白質結構的高分辨率模型,使用的主要方法是X射線晶體學。這種技術能夠將一種蛋白質溶液變成晶體,這個過程極其復雜并且要耗費大量時間。然后用X射線連續照射晶體,通常會使用一種名為同步加速器的環形粒子加速器。研究人員可以通過X射線的繞射圖繪制出蛋白質的內部結構圖。據多倫多大學(University of Toronto)估計,通過X射線晶體學這種方法獲取一個蛋白質的結構,需要耗時一年,成本約為12萬美元。
最近,還有兩種實驗方法也被用于預測蛋白質結構,它們分別是核磁共振和低溫電子顯微技術。這兩種方法的速度更快,成本更低,但其生成的模型精確度不及X射線晶體學。
而按照DeepMind蛋白質折疊團隊的首席研究員約翰·江珀的說法,AlphaFold 2使用“適度的”計算資源,只需要“幾天時間”就能夠計算出蛋白質的每一種結構。江珀表示,訓練該系統需要在16個芯片上使用由谷歌開發的128個專用人工智能計算單元,連續運行“大約幾周”。這種人工智能計算單元被稱為張量處理單元。他表示,該系統需要的計算能力,比公司最近的多項人工智能突破要少得多,包括之前的AlphaGo。
1972年,諾貝爾獎得主、化學家克里斯蒂安·安芬森曾經假設,DNA應該完全能夠決定蛋白質的最終結構。為了證明安芬森提出的設想,科學界數十年來一直在尋找數學模型。但問題是,即使物理定律可以決定蛋白質的折疊方式,蛋白質折疊可能存在大量其他排列,因此正如生物學家賽勒斯·利文索爾提出的一種著名的說法,通過隨機試錯法確定一個蛋白質的結構所需要的時間,可能比已知宇宙的年齡更長。
但DeepMind的AlphaFold 2現在已經基本實現了安芬森的設想。莫爾特表示,在“結構預測關鍵評估”競賽中,對于超過三分之二的蛋白質,AlphaFold 2和X射線晶體學的準確度不相上下。現在希望研究人員能夠利用AlphaFold 2,或者至少用相同的方法,直接根據蛋白質的DNA序列得出其3D形狀,不需要使用X射線晶體學或其他物理實驗。獲取蛋白質的DNA序列相對容易,并且成本低廉。
位于德國蒂賓根的馬克斯·普朗克發育生物學研究所(Max Planck Institute for Developmental Biology)的蛋白質進化系主任安德烈·盧帕斯是今年“結構預測關鍵評估”競賽的評審之一。他說DeepMind的結果“令人震驚。”
在“結構預測關鍵評估”競賽過程中,為了驗證DeepMind系統的能力,盧帕斯利用AlphaFold 2的預測結果,以確認它能否預測出一種蛋白質結構的最后一部分。10多年來,他利用X射線晶體學一直無法完成這部分結構的繪制。盧帕斯說利用AlphaFold 2生成的預測,他可以在短短半個小時內確定最后一個蛋白質區段的形狀。
AlphaFold 2已經被用于準確預測一種名為ORF3a的蛋白質的結構,這種蛋白質存在于導致新冠肺炎的SARS-CoV-2病毒當中。未來,科學家能夠根據其預測的結果,將這種蛋白質作為靶點,開發治療藥物。
盧帕斯表示,他認為對于從事蛋白質研究的科學家而言,這款人工智能軟件將“帶來顛覆性的變化”。目前已知約2億種蛋白質的DNA序列,并且每年可以發現數千萬個新的蛋白質。但已經繪制出3D結構的蛋白質不足20萬種。
AlphaFold 2是唯一一款專門用于預測單個蛋白質結構的人工智能。但蛋白質的性質決定了一種蛋白質通常會與其他蛋白質組成復雜的結構。江珀表示,下一步的目標是開發一種能夠預測蛋白質之間的復雜動態的人工智能系統,例如蛋白質之間如何結合,或者相鄰的蛋白質如何改變彼此的形狀等。
DeepMind兩年前參加了“結構預測關鍵評估”競賽并獲獎。但當時所使用的人工智能系統AlphaFold配置不同,在最難預測的一類蛋白質中,其平均“全局距離完全測試得分”(global distance test total score)只有58分。全局距離完全測試得分相當于其準確繪制的每一種蛋白質的百分比。
雖然這個分數比第二名的團隊高了約6分,但無法與X射線晶體學等實證研究方法相媲美。今年,即使是最難預測的蛋白質,DeepMind的全局距離完全測試得分中位數也達到了87分,接近于X射線晶體學的分數,比緊隨其后的團隊高出約26分。(財富中文網)
翻譯:劉進龍
審校:汪皓
Researchers have made a major breakthrough using artificial intelligence that could revolutionize the hunt for new medicines.
The scientists have created A.I. software that uses a protein’s DNA sequence to predict its three-dimensional structure to within an atom’s width of accuracy.
The achievement, which solves a 50-year-old challenge in molecular biology, was accomplished by a team from DeepMind, the London-based artificial intelligence company that is part of Google parent Alphabet. Until now, DeepMind was best known for creating A.I. that could beat the best human players at the strategy game Go, a major milestone in computer science.
DeepMind achieved the protein shape breakthrough in a biennial competition for algorithms that can be used to predict protein structures. The competition asks participants to take a protein’s DNA sequence and then use it to determine the protein’s three-dimensional shape.
Across more than 100 proteins, DeepMind’s A.I. software, which it called AlphaFold 2, was able to predict the structure to within about an atom’s width of accuracy in two-thirds of cases and was highly accurate in most of the remaining one-third of cases, according to John Moult, a molecular biologist at the University of Maryland who is director of the competition, called the Critical Assessment of Structure Prediction, or CASP. It was far better than any other method in the competition, he said.
Demis Hassabis, DeepMind’s cofounder and chief executive officer, said the company wants “to make the maximal positive societal impact with these technologies.” But he said DeepMind had not yet determined how it would provide academic researchers with access to the protein structure prediction software or whether it would seek commercial collaborations with pharmaceutical and biotechnology firms. He said the company would announce “further details on how we’re going to be able to give access to the system in a scalable way” sometime next year.
“This computational work represents a stunning advance on the protein-folding problem,” Venki Ramakrishnan, a Nobel Prize–winning structural biologist who is also the outgoing president of the Royal Society, Britain’s most prestigious scientific body, said of AlphaFold 2.
Janet Thornton, an expert in protein structure and former director of the European Molecular Biology Laboratory’s European Bioinformatics Institute, said that DeepMind’s breakthrough opened up the way to mapping the entire “human proteome”—the set of all proteins found within the human body. Currently, only about a quarter of human proteins have been used as targets for medicines, she said. Now, many more proteins could be targeted, creating a huge opportunity to invent new medicines.
Thornton also said that DeepMind’s A.I. system would have profound implications for scientists who create synthetic proteins and that these could have big impacts too: everything from creating new genetically modified crop strains that will be far more nutritious to new enzymes that could help clean up the environment by digesting plastics.
Proteins are the basic mechanisms of biological processes. They are formed from long chains of amino acids, coded for in DNA, but once manufactured by a cell, they fold themselves spontaneously into complex shapes that often resemble a tangle of cord, with ribbons and curlicue-like appendages. The exact structure of a protein is essential to its function. It is also critical for designing small molecules that might be able to bind with the protein and alter this function, which is how new medicines are created.
Until now, the primary way to obtain a high-resolution model of a protein’s structure was through a method called X-ray crystallography. In this technique, a solution of proteins is turned into a crystal, itself a difficult and time-consuming process, and then this crystal is bombarded with X-rays, often from a large circular particle accelerator called a synchrotron. The diffraction pattern of the X-rays allows researchers to build up a picture of the internal structure of the protein. It takes about a year and costs about $120,000 to obtain the structure of a single protein through X-ray crystallography, according to an estimate from the University of Toronto.
More recently, two other experimental methods—nuclear magnetic resonance and cryogenic electron microscopy—have also been used. They can be faster and less expensive but tend to produce models that are less precise than X-ray crystallography.
It takes AlphaFold 2 “a matter of days” to calculate each protein structure using what John Jumper, the researcher who leads the protein-folding team at DeepMind, characterized as “modest” computing resources. Training the system required 128 specialized A.I. computing units on 16 chips created by Google, called tensor processing units, running continuously for “roughly a few weeks,” Jumper said. He noted that this is much less computing power than has been required for many other recent A.I. breakthroughs, including DeepMind’s previous work on Go.
In 1972, Nobel Prize–winning chemist Christian Anfinsen postulated that DNA alone should fully determine what final structure a protein takes—a supposition that set off the decades-long quest to find a mathematical model that could do what Anfinsen was proposing. The problem was, however, that even though the laws of physics control how a protein folds, there are so many possible permutations that biologist Cyrus Levinthal famously estimated it would take longer than the age of the known universe to puzzle out a single protein’s structure through random trial and error.
But DeepMind’s AlphaFold 2 has now essentially done what Anfinsen suggested. AlphaFold 2 is “on par” with X-ray crystallography across more than two-thirds of the proteins in the CASP competition, Moult said. Now the hope is that researchers will be able to use AlphaFold 2, or at least the same method, to go directly from a protein’s DNA sequence, which has become relatively easy and inexpensive to obtain, to knowing its 3D shape, without having to use X-ray crystallography or other physical experiments at all.
Andrei Lupas, director of the department of protein evolution at the Max Planck Institute for Developmental Biology in Tübingen, Germany, who served as one of the assessors for this year’s CASP competition, called DeepMind’s results “astonishing.”
As part of CASP’s efforts to verify the capabilities of DeepMind’s system, Lupas used the predictions from AlphaFold 2 to see if it could solve the final portion of a protein’s structure that he had been unable to complete using X-ray crystallography for more than a decade. With the predictions generated by AlphaFold 2, Lupas said he was able to determine the shape of the final protein segment in just half an hour.
AlphaFold 2 has also already been used to accurately predict the structure of a protein called ORF3a that is found in SARS-CoV-2, the virus that causes COVID-19, which scientists might be able to use as a target for future treatments.
Lupas said he thought the A.I. software would “change the game entirely” for those who work on proteins. Currently, DNA sequences are known for about 200 million proteins, and tens of millions more are being discovered every year. But 3D structures have been mapped for less than 200,000 of them.
AlphaFold 2 was only trained to predict the structure of single proteins. But in nature, proteins are often present in complex arrangements with other proteins. Jumper said the next step was to develop an A.I. system that could predict complicated dynamics between proteins—such as how two proteins will bind to one another or the way that proteins in close proximity morph one another’s shapes.
DeepMind had entered and won the CASP competition two years ago. But at the time, using an A.I. system called AlphaFold that was configured differently, it was only able to achieve an average “global distance test total score” (GDT) —a measure that is approximately equivalent to the percentage of each protein that it accurately maps—of 58 on the hardest class of proteins.
Although this was about six points better than the next best team, it was not a result that was competitive with empirical methods like X-ray crystallography. This year, even on these hardest proteins, DeepMind achieved a median GDT of 87, which is close to being as good as crystallography and was about 26 points better than its nearest competitor.