Research by team at Argonne National Laboratory and the University of California San Diego leads to a novel understanding of SARS-CoV-2 and a new method for studying disease

Since a team at the University of Texas at Austin and the National Institutes of Health first mapped the SARS-CoV-2 spike protein—the main infection machinery of the virus that causes the COVID-19 disease—scientists around the world have been eager to understand more about this structure and the others that make up the virus to better predict which drugs might successfully be used against it.

Some of the methods used to study the virus include imaging techniques such as X-ray imaging and cryogenic electron microscopy—which uses beams of electrons to image frozen samples—but these fall short of capturing the dynamic movements of the viral proteins. But, computer simulations, such as those performed on systems like the Oak Ridge Leadership Computing Facility’s (OLCF’s) 200-petaflop IBM AC922 Summit supercomputer, can help scientists capture the movements of these structures virtually.

A snapshot of a visualization of the SARS-CoV-2 viral envelope comprising 305 million atoms. Image Credit: Rommie Amaro, University of California San Diego; Arvind Ramanathan, Argonne National Laboratory

A team led by Rommie Amaro, professor and endowed chair of chemistry and biochemistry at the University of California San Diego, and Arvind Ramanathan, computational biologist at Argonne National Laboratory, has been exploring the movement of the virus’s spike protein to understand how it behaves and gains access to the human cell. Now, the team has built a first-of-its-kind workflow based on artificial intelligence (AI) and has run it on the Summit supercomputer to simulate the spike in numerous environments, including within the SARS-CoV-2 viral envelope comprising 305 million atoms—the most comprehensive simulation of the virus performed to date.

The accomplishment has earned the team a finalist nomination for the Association of Computing Machinery (ACM) Gordon Bell Special Prize for High Performance Computing–Based COVID-19 Research, a special version of the ACM Gordon Bell Prize, one of the most coveted awards in supercomputing to be presented at this year’s SC20 virtual conference. Both awards acknowledge outstanding achievements in high-performance computing, with the new prize focused specifically on COVID-19 research.

“Experiments give us a picture of what these things look like, but they can’t tell us the whole story,” Amaro said. “The only way we can do this is through simulations, and right now we are pushing the capabilities of molecular simulations to the limits of the computer architectures that we have on this earth. This is at the edge of possibilities of what people are capable of doing.”

The team first optimized the Nanoscale Molecular Dynamics (NAMD) and the Visual Molecular Dynamics codes, which model the movements of atoms in time and space, on multiple smaller cluster systems: the Frontera supercomputer at the Texas Advanced Computing Center, the Comet system at the San Diego Supercomputer Center, and ThetaGPU at the Argonne Leadership Computing Facility (ALCF). The optimizations prepared the team to run full-scale simulations on the OLCF’s Summit. The OLCF and the ALCF are US Department of Energy (DOE) Office of Science User Facilities located at DOE’s Oak Ridge and Argonne National Laboratories, respectively.

After code optimizations, the team was able to successfully scale NAMD to 24,576 of Summit’s NVIDIA V100 GPUs. The results of the team’s initial runs on Summit have led to discoveries of one of the mechanisms that the virus uses to evade detection as well as a characterization of interactions between the spike protein and the protein that the virus takes advantage of in human cells to gain entrance into them—the ACE2 receptor.

“This is one of the first biological systems of the virus that we can learn from to drive scientific discovery,” Amaro said. “Our methods of computing allow us to get down to actually see detailed intricacies of this virus that are useful for understanding not only how it behaves but also its vulnerabilities, from a vaccine development standpoint, and a drug targeting perspective.”

Because one set of the calculations generated a whopping 200 terabytes of data, the team used AI to identify the intrinsic features from the simulations and break down the information to help them interpret what was happening. By layering the experimental data and the simulation data and combining it with their AI-based approach, the researchers were able to capture the virus and its mechanisms in unprecedented detail.

The team will share the results of their findings at the SC20 conference tomorrow, November 19. The team is also integrating the NAMD code into their workflow pipeline to fully automate the transition from simulation to AI for data processing without gaps.

“We never thought we could use our machine-learning tools at this scale,” Ramanathan said. “Using these AI-based approaches on Summit has helped accelerate the process of truly understanding the motion of these complex systems.”

This research was supported by the Exascale Computing Project; the DOE National Virtual Biotechnology Laboratory, with funding provided by the Coronavirus CARES Act; and the COVID-19 HPC Consortium.

Related Publication: Lorenzo Casalino, Abigail Dommer, Zied Gaieb, Emilia P. Barros, Terra Sztain, Surl-Hee Ahn, Anda Trifan, Alexander Brace, Anthony Bogetti, Heng Ma, Hyungro Lee, Matteo Turilli, Syma Khalid, Lillian Chong, Carlos Simmerling, David J. Hardy, Julio D. C. Maia, James C. Phillips, Thorsten Kurth, Abraham Stern, Lei Huang, John McCalpin, Mahidhar Tatineni, Tom Gibbs, John E. Stone, Shantenu Jha, Arvind Ramanathan, and Rommie E. Amaro. “AI-Driven Multiscale Simulations Illuminate Mechanisms of SARS-CoV-2 Spike Dynamics.” To appear in International Journal of High Performance Computing Applications, 2020.