An international team simulated the process used by the coronavirus to reproduce itself and overwhelm cells

To find weapons to fight the coronavirus, scientists used the nation’s fastest supercomputer to peer inside the intricacies of how the virus reproduces itself.

“Think of it as like a Swiss watch, with precisely organized enzymes and nanomachines that come together like tiny gears to perform this function,” said Arvind Ramanathan, a computational scientist at the U.S. Department of Energy’s Argonne National Laboratory and the study’s lead author. “If we could find ways to block or gum up the copying process, we could discover new drugs to attack the virus. But first we had to better understand it.”

The study earned the multi-institutional team a spot as a finalist for the Association of Computing Machinery Gordon Bell Special Prize for High Performance Computing–Based COVID-19 Research. The prize will be presented at this year’s International Conference for High Performance Computing, Networking, Storage and Analysis (SC21) in St. Louis, MO. The team will present their results at the conference Wednesday, Nov. 17.

The coronavirus uses a precisely coordinated process known as the replication-transcription complex to reproduce at high speed when it invades a host’s cells. The process essentially transcribes the ribonucleic acid, or RNA, that contains the genetic code for the virus, packages the RNA and pumps out the photocopies of itself to overwhelm the host cells.

“It’s a system of roughly 2 million atoms, and there’s no single way to get a really good look inside,” Ramanathan said. “A number of scientists have done tremendous work to understand some of these individual parts, but nobody had looked at this complex from a broader view to try to understand how they all work together.”

The team used data from cryo-electron microscopy, a technique that flash-freezes molecules and pounds them with electrons to create 3D images, to take a closer look at the molecular machinery. But static images alone wouldn’t be enough to capture the workings of the copying process.

To study the SARS-CoV-2 replication transcription complex, the team built an integrated workflow that models the experimental Cryo-EM volumetric data as a finite element mesh. This mesh is then simulated using fluctuating finite element analysis (FFEA). To annotate the mesh with the strengths of the protein-protein interactions, the researchers used all-atom molecular dynamics (AAMD) simulations and employed AI-techniques to automatically bridge the FFEA and AAMD simulations.
Credit: Defne Gorgun, Anda Trifan and Arvind Ramanathan

“It’s critical to be able to see the interactions between the nonstructural proteins (which are mostly enzymes) as they process the viral RNA one base at a time,” Ramanathan said. “The molecules move in a complicated pattern like a waltz. We needed to see that waltz and the machinery in motion in order to understand how to gum this Swiss watch up.”

The team used a hierarchical artificial-intelligence workflow running in the Balsam framework, a distributed workflow engine capable of tying together four of the nation’s top supercomputing systems — Summit, the Oak Ridge Leadership Computing Facility’s (OLCF’s) 200-petaflop flagship computer; Theta, the Argonne Leadership Computing Facility’s (ALCF’s) 15.6-petaflop system; Perlmutter, the National Energy Research Scientific Computing Center’s (NERSC’s) 64.6-petaflop system; and Longhorn, a subsystem of the Texas Advanced Computing Center’s 23.5-petaflop Frontera system — to simulate the process. The team used a Cerebras wafer-scale engine via the ALCF AI Testbed to train deep-learning models that were coupled with the four supercomputing systems. The workflow builds on a strategy employed by Ramanathan and Rommie Amaro, a professor and endowed chair of chemistry and biochemistry at the University of California San Diego, to simulate the behavior of the virus’s spike protein, a study that won last year’s Gordon Bell Special Prize for COVID-19 research.

“By coordinating this work across these sites, we could use all the strengths of the best state-of-the-art computing to perform these simulations,” Ramanathan said. “Everything had to come together in just the right place in just the right way, like an assembly line. These simulations helped fill in the blanks the cryo-electron microscopy couldn’t capture and reconstruct the motion that we couldn’t otherwise understand to reach a biophysically meaningful interpretation.”

In a second study also nominated for this year’s Gordon Bell Special Prize, Ramanathan and Amaro teamed up with other researchers to study an aerosolized coronavirus particle by simulating a system of more than 1 billion atoms.

Besides Ramanathan, the team studying the replication-transcription complex included Anda Trifan, Defne Gorgun, Alexander Brace, Maxim Zvyagin, Heng Ma, Austin Clyde, Michael Salim, Murali Emani, Hyenseung Yoo, James Phillips, Ian Foster, Rick Stevens and Venkatram Vishwanath of Argonne; Zongyi Li and Anima Anandkumar of the California Institute of Technology; David Clark, Venkatesh Mysore and Thomas Gibbs of Nvidia Corp.; David J. Hardy, Noah Trebesch, John E. Stone and Emad Tajkhorshid of the University of Illinois Urbana-Champaign; Tom Burnley of the Science and Technology Facilities Council, United Kingdom; Lei Huang and John McCalpin of the Texas Advanced Computing Center; Junqi Yin and Aristeidis Tsaris of DOE’s Oak Ridge National Laboratory; Vishal Subbiah, Tanveer Raza and Jessica Liu of Cerebras Systems; Geoffrey Wells of the University College of London; S. Chakra Chennubhotla of the University of Pittsburgh; and Sarah A. Harris of the University of Leeds.

This research was supported by the Exascale Computing Project, a collaborative effort of the DOE Office of Science and the National Nuclear Security Administration; the COVID-19 HPC Consortium; the DOE National Virtual Biotechnology Laboratory, with funding provided by the Coronavirus CARES Act; and the DOE Office of Science’s Advanced Scientific Computing Research program. Support was organized under the Co-Design for Artificial Intelligence and Computing at Scale for Extremely Large, Complex Datasets projects.

The OLCF, ALCF and NERSC are DOE Office of Science user facilities.

Related Publication: Anda Trifan, Defne Gorgun, Zongyi Li, Alexander Brace, Maxim Zvyagin, Heng Ma, Austin Clyde, David Clark, Michael Salim, David J. Hardy, Tom Burnley, Lei Huang, John McCalpin, Murali Emani, Hyenseung Yoo, Jungyi Yin, Aristeidis Tsaris, Vishal Subbiah, Tanveer Raza, Jessica Liu, Noah Trebesch, Geoffrey Wells, Venkatesh Mysore, Thomas Gibbs, James Phillips, S. Chakra Chennubhotla, Ian Foster, Rick Stevens, Anima Anandkumar, Venkatram Vishwanath, John E. Stone, Emad Tajkhorshid, Sarah A. Harris and Arvind Ramanathan. “Intelligent Resolution: Integrating Cryo-EM with AI-driven Multi-resolution Simulations to Observe the SARS-CoV-2 Replication-Transcription Machinery in Action.” To appear in International Journal of High Performance Computing Applications, 2021.

UT-Battelle LLC manages Oak Ridge National Laboratory for DOE’s Office of Science, the single largest supporter of basic research in the physical sciences in the United States. DOE’s Office of Science is working to address some of the most pressing challenges of our time. For more information, visit https://energy.gov/science.