PI: Rick Stevens
Associate Laboratory Director, Computing, Environment and Life Sciences, Argonne National Laboratory
In 2016, the Department of Energy’s Exascale Computing Project (ECP) set out to develop advanced software for the arrival of exascale-class supercomputers capable of a quintillion (1018) or more calculations per second. That leap meant rethinking, reinventing, and optimizing dozens of scientific applications and software tools to leverage exascale’s thousandfold increase in computing power. That time has arrived as the first DOE exascale computer — the Oak Ridge Leadership Computing Facility’s Frontier — opened to users around the world. “Exascale’s New Frontier” explores the applications and software technology for driving scientific discoveries in the exascale era.
The Science Challenge
The Joint Design of Advanced Computing Solutions for Cancer was launched in 2016 as a collaboration between the Department of Energy and the National Cancer Institute to accelerate cancer research through deep learning algorithms and exascale supercomputers. As part of the National Cancer Moonshot initiative, the partnership identified three key science challenges that their combined resources would be well suited to tackle in the fight against cancer.
First, mapping the molecular interactions that drive cancer mutations and the resulting physiological changes in cancer cells, a mapping achieved by modeling the RAS protein complex, could help identify new therapeutics that target a specific protein that’s involved in 30% of all cancers.
Second, developing predictive models for how a tumor cell will respond to different drug treatments may help identify new therapeutics or even help with designing personalized treatments that target the genetic makeup of a patient’s tumor cells.
Third, using AI to automate the analysis and extraction of information from millions of cancer patient records could expedite the discovery of optimal cancer treatment strategies based on population trends and outcomes.
“Our focus was on building the core models behind this idea of predictive oncology. When we started this, deep learning was not well internalized by the clinical community or even the cancer research community as a resource. We realized early on, through a lot of experiments, that you couldn’t build accurate models using deep neural networks on a small amount of data. So, our idea was, how can we aggregate enough data that the power of neural networks might actually be tapped?” said Rick Stevens, principal investigator for the CANcer Distributed Learning Environment, or CANDLE, and associate laboratory director at Argonne National Laboratory.
Why Exascale?
Each of these science challenges requires exascale computing power due to the size and complexity of the problems they aim to solve through deep learning networks, which must be trained on very large datasets. The CANDLE team aggregated data across multiple historical drug response studies and produced models for single drugs and for drug combinations — the first effort to do so at such a large scale.
The resulting CANDLE software platform was developed by the ECP to provide scalable deep learning capabilities — optimized for exascale platforms — for each NCI-DOE project tackling these science challenges.
The AI-Driven Multi-Scale Investigation of the RAS/RAF Activation Lifecycle (ADMIRRAL) project built a novel computing infrastructure using machine learning on experimental data to guide a massive ensemble of simulations to create the most accurate picture of key RAS-RAF protein interactions.
The Innovative Methodologies and New Data for Predictive Oncology Model Evaluation (IMPROVE) project used unsupervised machine learning methods to capture the complex, nonlinear relationships between the properties of drugs and the properties of the tumors. This data is then used to predict the tumor’s responses to different treatments, and the resulting model can provide specific treatment recommendations for the target tumor.
Modeling Outcomes Using Surveillance Data and Scalable Artificial Intelligence for Cancer (MOSSAIC) used semi-supervised machine learning to automatically read and encode millions of disparate clinical reports into a consistent form to enable data-driven predictive modeling of patient-specific health trajectories. CANDLE built thousands of phenotype classification models by using combinations of descriptive terms extracted from 10,000 curated text training sets.
“The main driver for the CANDLE project is the synergy we had as agencies facing the challenges of the Cancer Moonshot. We knew that we needed to create a specific software stack to address the needs of the cancer research community and to drive the innovations that were taking place at the time while we were trying to build our exascale computing systems,” said Georgia Tourassi, associate laboratory director for computing and computational sciences at ORNL. “CANDLE was a fantastic collaboration between the two agencies because we envisioned the project as a way to drive our respective missions. And I believe that we have a lot of great successes to show.”
Frontier Success
The CANDLE team significantly exceeded its ECP-set performance goal of a 50× speedup of its codes running on Frontier versus 2016’s state-of-the-art supercomputer, Titan. CANDLE’s combined performance of its Uno and P3B1 benchmark codes showed a 264× performance improvement.
“We got smarter in terms of how to implement these AI algorithms, and the hardware got a lot better. Frontier is a great system, and over the years the team as a whole has grown — from deep learning at its infancy to being experts in deep learning. So, DOE came out ahead on both fronts, and NCI came out ahead because of the various scientific publications and discoveries that occurred throughout the project,” said Thomas Brettin, a manager for the CANDLE project and a strategic program manager at Argonne.
During the global coronavirus pandemic, the CANDLE team temporarily shifted its focus from cancer to COVID-19. Members of the CANDLE team contributed to a project that won a 2022 Association for Computing Machinery Gordon Bell Special Prize for High Performance Computing-Based COVID-19 Research, “GenSLMs: Genome-Scale Language Models Reveal SARS-CoV-2 Evolutionary Dynamics.”
Notably, CANDLE was recognized with a 2023 R&D 100 Award in the software/services category. CANDLE benchmark codes are also being used by several companies to test their AI hardware and software, including Cerebras, SambaNova, Groq, Intel and AMD.
What’s Next?
CANDLE and the science projects it supported have made a lasting impact in cancer research, not only with the results they produced but also by making the open-source CANDLE codes available to other researchers for their own investigations, even on less powerful computers.
“Now there’s over a hundred models that have been published by many different groups, more or less trying to do the same thing we did: build a core that can predict which tumor is going to respond to which drug,” Stevens said. “That made us realize that the challenge going forward right now isn’t so much about building a single model to do that; it’s about trying to understand all the different ideas the community has tried to improve those models.”
The team has continued the IMPROVE project, which now aims to systematically assess the models that have been built by the cancer research community and learn which techniques, types of data, types of transformations and ways of encoding patient outcomes are most effective.
“This ongoing collaboration between NCI and DOE exemplifies the power of interdisciplinary research that uses complementary expertise. We brought the computer scientists, the physicists and the engineers to work alongside the oncologists, the biologists and the clinical researchers to dream different ways of tackling well-known challenges that were part of the Cancer Moonshot Initiative. That cross-pollination of ideas is what has led to innovative approaches in cancer research that we are now seeing make it into practice,” Tourassi said.
Support for this research came from the ECP, a collaborative effort of the DOE Office of Science and the National Nuclear Security Administration, and from the DOE Office of Science’s Advanced Scientific Computing Research program. The OLCF is a DOE Office of Science user facility located at ORNL.
UT-Battelle LLC manages ORNL for DOE’s Office of Science, the single largest supporter of basic research in the physical sciences in the United States. The Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit https://energy.gov/science.