On May 7, the US Department of Energy (DOE) announced the Frontier exascale supercomputer is slated for delivery in 2021 at DOE’s Oak Ridge National Laboratory (ORNL). While researchers who use high-performance computing to advance the toughest problems in modern science can look forward to Frontier’s arrival, many will begin their journey to exascale with Summit—ORNL’s 200-petaflop system and the world’s fastest supercomputer.
“With Summit, we made a big step forward by introducing a converged architecture in which modeling and simulation, data analytics, and artificial intelligence (AI) can coexist in a much richer way than previous machines,” said Jack Wells, director of science at the Oak Ridge Leadership Computing Facility (OLCF), the DOE Office of Science User Facility that manages Summit at ORNL. “Frontier will continue this trend.”
As Summit’s successor, Frontier will be over five times faster, topping more than a quintillion (or billion billion) calculations per second. Meanwhile, Summit’s GPU-accelerated architecture, which pairs two CPUs with six GPUs per node, marks the path to exascale as Frontier and other exascale systems will also leverage accelerated processors.
“Summit is the largest GPU machine in the United States, so it is really a stepping stone to exascale,” said Bronson Messer, director of the Center for Accelerated Application Readiness (CAAR) program, which will work with code teams over the next 3 years on optimizing a range of scientific applications for exascale.
However, Summit does not wring performance from processing speed alone but depends on high-bandwidth technology and local memory on each node to improve data analytics and AI techniques such as machine learning.
“Summit is a bandwidth-rich computer,” Wells said. “As we moved into the petascale, the number one priority we were hearing from our facility users shifted from more flops [floating point operations per second, the leading performance metric for supercomputers] to increased communications bandwidth for data movement. Summit addresses this shift in a big way.”
While the majority of Summit users are solving immediate problems in science and energy fields, researchers who access Summit through the CAAR program and DOE’s nationwide Exascale Computing Project (ECP) will be testing codes for future challenges to be tackled on exascale computers, including Frontier.
“Summit is a crucial system for our exascale readiness,” said ECP Director Doug Kothe. “It’s crucial for many reasons, not the least of which is the accelerated hardware, which we expect to see in exascale systems.”
Kothe said ECP is also planning for the increased integration of machine learning and AI into traditional modeling and simulation workloads. An example is the integration of exascale machine learning software to ECP applications through the project’s ExaLearn codesign center.
“We’re specifically targeting how AI and machine learning can make our traditional science workloads more efficient and productive,” Kothe said. “And we’re going to see more and more of these types of integrated workloads, which are supported by Summit, at exascale.”
In particular, the tensor core technology in Summit’s NVIDIA Volta GPUs speeds mixed-precision calculations—a technique that is beneficial to training networks in machine learning in which a high degree of numerical precision (i.e., many decimal places) is not always required.
Both ECP and CAAR teams are tasked with identifying an exascale challenge problem, or a computational science problem that cannot be solved on petascale computers due to its complexity. ECP has over 20 teams developing exascale applications to address challenge problems across science domains, from R&D for future accelerators and reactor designs to fundamental research in microbiology and materials science. Likewise, about a half-dozen CAAR teams, once selected later this year, will reach across disciplines and computational techniques to include traditional modeling and simulation and AI.
“One of the first things CAAR users will do is jump on Summit to get a baseline for their figure of merit. This is the metric which will be used to measure the quantifiable performance increase of the code. At the end of the program, they’ll run the same code on Frontier to determine this increase in performance,” Messer said.
Research teams interested in participating in the CAAR program for exascale application readiness can respond to a call for proposals on olcf.ornl.gov.
To view the range of ECP applications under development, visit exascaleproject.org.
To learn more about OLCF’s future exascale system, Frontier, visit olcf.ornl.gov/frontier.
UT-Battelle LLC manages Oak Ridge National Laboratory for DOE’s Office of Science, the single largest supporter of basic research in the physical sciences in the United States. DOE’s Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit https://science.energy.gov.