In Silico Drug Discovery
Understanding the biology of a virus is impossible without uncovering the structure of the viral proteins. Unfortunately, experimental methods for resolving protein structure are complex and time-consuming. Thus, predicting the properties of proteins based on their sequences, which are much easier to determine experimentally, plays an important role in structural biology. Recently, the development of deep learning models has made great progress in many fields, even providing solutions for many problems previously deemed intractable. Structural biology has also greatly benefitted from advances in deep learning, including its application for the structure prediction.
Simplistically, protein can be thought of as a long string — a linear chain of amino acid residues, the particular sequence of which is determined by organism’s DNA (or, for some viruses, including SARS-CoV-2, its RNA). And while the sequence, in the vast majority of the cases, fully determines the protein, it is next to impossible to study the protein’s behavior without knowing how this chain arranges itself in 3-D space. And while methods for predicting local features of the protein (“secondary structure”) have existed for many years, it is challenging to this day to reliably infer how these local elements are positioned relative to each other. As the local structure is rigid and relatively easy to predict, being able to determine even a few long-range contacts — pairs of residues that are far from each other in the sequence, but close by in the 3-D structure — can help tremendously in determining the overall structure of the protein, or even the relative orientation of two proteins when they form a stable complex.
One way of determining such contacts is finding similar proteins in other organisms and looking at the mutations — changes in types of residues. When two residues are close in 3-D space, they interact. And if one of them changes, its neighbor is often also must change if the protein stability is to be preserved. Such co-evolving residue pairs have been shown to be a great predictor of both intra-protein and inter-protein contacts.
Deep learning techniques are extremely helpful for predicting such long-range contacts, as evidenced by the recent rounds of CASP — community-wide challenge, comparing the performance of different folding methods on previously unpublished proteins.
Building upon these results and generously supported by the hardware resources offered by Zeblok, our group is working on using co-evolutionary data and deep learning methodology for the prediction of structures of individual proteins and their complexes.
Figure 1. A part of multiple sequence alignment of the orf3a protein from SARS-CoV-2 and homologous proteins from other viruses. Analyzing which residues tend to change together can shed light on their interactions. Black lines at the bottom demonstrate two such pairs (for illustrative purposes only).
Figure 2. The strength of the coevolutionary signal (left) and predicted probability of contact (right) for the membrane (M) protein of SARS-CoV-2.
Zeblok AI- Platform resources used:
Zeblok Computational platform
HPC Notebook with MPI enabled
Multiple containers to support multi-GPU, multi-CPU compute engines
8 RTX6000s GPUs
50GB Block Store
500GB of Parallel File System
About The Laufer Center
The Louis and Beatrice Laufer Center, established in 2008 to advance biology and medicine through discoveries in physics, mathematics and computational science, is a hub for Physical and Quantitative Biology research at Stony Brook University. Laufer Center researchers come from several Stony Brook departments and Cold Spring Harbor Laboratory. For more information:
https://laufer-covid.org/ and http://laufercenter.stonybrook.edu/