PI: Mark Gates,
Research Assistant Professor, Innovative Computing Laboratory at the University of Tennessee, Knoxville
In 2016, the Department of Energy’s Exascale Computing Project, or ECP, set out to develop advanced software for the arrival of exascale-class supercomputers capable of a quintillion (1018) or more calculations per second. That leap meant rethinking, reinventing and optimizing dozens of scientific applications and software tools to leverage exascale’s thousandfold increase in computing power. That time has arrived as the first DOE exascale computer — the Oak Ridge Leadership Computing Facility’s Frontier — opened to users around the world. “Exascale’s New Frontier” explores the applications and software technology for driving scientific discoveries in the exascale era.
Why Exascale Needs SLATE
For nearly 30 years, scores of science and engineering projects conducted on high-performance computing systems have used either the Linear Algebra PACKage (LAPACK) library or the Scalable Linear Algebra PACKage (ScaLAPACK) library to solve their dense linear algebra problems. However, HPC technology has greatly advanced over those decades, especially with the introduction of GPU accelerators that these CPU-only library packages are unable to use.
Consequently, the ECP supported the development of Software for Linear Algebra Targeting Exascale, or SLATE, which is a performance-portable and GPU-accelerated dense linear algebra library written in C++ for exascale supercomputers. SLATE aims to extract the full performance potential and maximum scalability from many-node HPC machines with many cores and multiple GPU accelerators per node.
“We wrote SLATE from scratch so that we could have a nicer, more modern interface that’s easier for applications to use and that targets GPUs from the ground up. We also built this to be portable, so it works on all the different DOE machines, like Frontier and El Capitan [Lawrence Livermore National Laboratory] with the AMD GPUs and Aurora [Argonne National Laboratory] with its Intel GPUs,” said Mark Gates, principal investigator for the SLATE project and a research assistant professor at the Innovative Computing Laboratory at the University of Tennessee, Knoxville.
Technical Challenges
The SLATE project was launched in 2016, the same year as the ECP itself, which meant that the hardware specs for each upcoming DOE exascale supercomputer were years away from being finalized. GPUs and their math libraries vary among different chip vendors (e.g., AMD, Intel, NVIDIA), so the team wasn’t initially sure which brand’s GPUs to target. This challenge made SLATE’s portability on different hardware platforms an early goal.
Thus, in addition to SLATE’s own library, the team also wrote new C++ wrappers of the libraries LAPACK and Basic Linear Algebra Subprograms, or BLAS, which together serve as a portability layer in the SLATE package.
“We wrap the vendor-provided libraries — like NVIDIA’s CUDA libraries, AMD’s ROCm libraries and Intel’s oneMKL libraries — so that SLATE doesn’t have to deal with the differences between them. It can just rely on its own portability layer,” Gates said.
Later, after running SLATE on the petascale DOE systems Summit (at the OLCF) and Perlmutter (at Lawrence Berkeley National Laboratory), the team worked to make SLATE run smoothly on the newly operating Frontier.
“Some things that worked well on Summit didn’t work nearly as well on Frontier in terms of performance. Optimizations became very important when we moved to Frontier. For example, Frontier’s network is connected directly to the GPU accelerators, which is great, but it also required changing the way that we did communication to take advantage of that,” Gates said.
ECP and Frontier Successes
A key challenge for the team was successfully integrating the SLATE package with ECP-supported science applications. This integration effort included WarpX, which is a particle-in-cell code for kinetic plasma simulations, and NWChemEx, which is a molecular modeling package for computational chemistry, and both use SLATE’s BLAS++ and LAPACK++ portability layer. (WarpX would go on to win the Association for Computing Machinery’s 2022 Gordon Bell Prize.)
The team also integrated SLATE with the ECP-supported software technology code STRUctured Matrix PACKage, or STRUMPACK, which is another linear algebra solver for sparse matrices but that now uses SLATE when it must solve large, dense systems.
What’s Next?
The team is currently working to ensure SLATE runs smoothly on the Aurora supercomputer. Beyond transitioning the project into a maintenance phase to address issues as they arise on exascale systems, the team will also continue developing SLATE’s solvers for eigenvalue problems, which are algorithms often used in chemistry applications such as NWChemEx and the Quantum Monte Carlo PACKage, or QMCPACK.
According to Gates, the SLATE project wouldn’t have happened without the ECP’s support.
“Before the ECP, we really had no intention of doing such a massive project, so we were making little additions to ScaLAPACK and things like that. But just completely rewriting the entire thing, we just didn’t have the scope necessary for that,” Gates said. “The ECP provided not only the funding but also access to all these resources — the prerelease hardware and interaction with the vendors — so that we could really be able to target those machines and have all the software written just as the machines were coming out.”
Support for this research came from the ECP, a collaborative effort of the DOE Office of Science and the National Nuclear Security Administration, and from the DOE Office of Science’s Advanced Scientific Computing Research program. The OLCF is a DOE Office of Science user facility located at DOE’s Oak Ridge National Laboratory.
UT-Battelle LLC manages ORNL for DOE’s Office of Science, the single largest supporter of basic research in the physical sciences in the United States. The Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit https://energy.gov/science.