Project Description

The project is focused on a multi-stage workflow that aims to screen orders of magnitude more compounds than is currently possible with traditional drug screening workflows. We posit that artificial intelligence (AI) methods can serve as an effective surrogate to traditional and computationally expensive docking simulations. We leverage generative AI models to ‘suggest’ new molecular designs that can significantly expand existing drug libraries, while including the existing FDA approved compounds as ’templates’. Our multi-stage workflow uses a building blocks approach and includes both traditional physics-based molecular docking and molecular dynamics (MD) simulations integrated with AI techniques to improve the throughput and accuracy of virtual screening techniques. This strategy consists of a fast, interactive loop that leverages our AI approaches and interfaces it with a variable time loop that includes (a) gathering and integrating available uHTS experimental data and (b) performing slow but accurate MD binding free-energy simulations. The virtual screening and MD simulations enable the refinement of AI approaches so that successive rounds of evaluations in the AI loop can enrich the list of viable small molecule leads.

Our strategies have been deployed across multiple supercomputing facilities while enabling standardization of data collection and organization. Our strategies also integrate standard experimental validation techniques such as ultra-high throughput screening (uHTS) of small molecule libraries. These strategies for COVID-19 drug screening can also be applied to antibiotic development in future work. We use this platform at scale on systems at the DOE Leadership Computing Facilities of Argonne Leadership Computing Facility (Theta), Oak Ridge Leadership Computing Facility (Summit), Texas Advanced Computing Center (Frontera and Stampede2), and San Diego Supercomputing Center (Comet). Primary collaborations include Brookhaven National Laboratory, University of Chicago, Oak Ridge National Laboratory, Lawrence Livermore National Laboratory, Stony Brook University (Carlos Simmerling), University of California at San Diego (Rommie Amaro), and University College London (Peter Coveney).

This project is part of the COVID-19 HPC Consortium.

Project Utilization

Number of Jobs Run To-Date Node Hours Used To-Date Number of Jobs Running Now Nodes in Use Now
1,534 37,537 0 0