Easing applications and software technologies onto exascale systems with a performance-portable programming model

Co-PI: Christian Robert Trott,
Sandia National Laboratories
Co-PI: Damien Lebrun-Grandie,
Oak Ridge National Laboratory 

In 2016, the Department of Energy’s Exascale Computing Project (ECP) set out to develop advanced software for the arrival of exascale-class supercomputers capable of a quintillion (1018) or more calculations per second. That leap meant rethinking, reinventing, and optimizing dozens of scientific applications and software tools to leverage exascale’s thousandfold increase in computing power. That time has arrived as the first DOE exascale computer — the Oak Ridge Leadership Computing Facility’s Frontier — opened to users around the world. “Exascale’s New Frontier” explores the applications and software technology for driving scientific discoveries in the exascale era.

Why Exascale Needs Kokkos

When the DOE launched the ECP in 2016 to prepare science applications and software technology codes to run on its upcoming exascale supercomputers, there was still an open question about the hardware architectures that these computers would be using.

The CPUs and GPUs chosen by system engineers at leadership computing facilities have a direct impact on the work of software developers because the software itself must be tailored to effectively use the underlying hardware — including both the CPU and GPU. Further complicating matters, GPUs and their application programming interfaces vary significantly between vendors (e.g., AMD, Intel, NVIDIA), and this means even more work for the software developers when encountering new or different architectures.

So, how can developers work on porting their codes before they even know which GPUs to target? Enter Kokkos, a C++ parallel programming model that allows developers to write their codes once and then compile them to run on different architectures from different vendors.

“So, instead of writing CUDA code for NVIDIA GPUs, HIP code for AMD GPUs, SYCL code for Intel GPUs, and OpenMP code for ARM CPUs, you write Kokkos code,” said Christian Robert Trott, co-leader of the Kokkos project and a principal member of technical staff at Sandia. “Kokkos takes care of mapping your code to the underlying architecture of the vendor programming model. Our role is to soak up all the pain involved with new architectures, new compilers and whatnot and make it all much, much less painful for everybody who uses Kokkos.”

Kokkos architecture

Kokkos isolates applications, libraries, and frameworks from the details of underlying hardware. The C++ parallel programming model allows developers to write their codes once and then compile them to run on different architectures from different vendors. Image: Christian Robert Trott, Sandia National Laboratories.

Technical Challenges

About three years after the ECP’s formation, the DOE announced that ORNL’s Frontier and Lawrence Livermore National Laboratory’s El Capitan supercomputers would be using AMD CPUs and GPUs, whereas Argonne National Laboratory’s Aurora would use Intel CPUs and GPUs. Previously, NVIDIA’s GPUs and its CUDA platform had been nearly dominant in the top 10 fastest supercomputers in the world.

Consequently, to support these new GPUs, the Kokkos team would have to implement backends for the associated vendor programming models — and that required extensive collaboration with the Intel and AMD software teams as they developed their own programming tools.

“We have a lot of experience, from a computer science perspective, with how to design programming models and interfaces. And we give that kind of feedback to vendors to help them make their offerings more suitable for what the ECP or the DOE and the entire high-performance computing community need,” Trott said. “That means identifying gaps between what we are doing and what they are proposing to give our applications, to tell them, ‘We can’t do our applications with what you are proposing to give us. But here are all the things you need to make this work.’”

ECP and Frontier Successes

The key challenge set before the Kokkos team was relatively simple: ensure that applications using Kokkos will be able to run on the exascale architectures. This goal was achieved.

“The codes that were using Kokkos from the beginning because they wanted to run the same code on GPUs and CPUs had to make minimal, if any, changes at all to run on Frontier. That’s because all the work of porting to Frontier happened within the Kokkos project instead of in their own projects, so they could focus just on optimizing their code instead of rewriting it for a new programming model,” Trott said.

Some of the key science applications and software tools that use Kokkos and have successfully run on Frontier include LAMMPS, ExaSGD, Trilinos, ArborX, Cabana, and VTK-m.

In the end, half of the ECP-supported applications and software technology tools written in C++ use Kokkos to run on Frontier.

What’s Next?

On the technical side, the Kokkos team will continue to support new architectures as they are developed and deployed.

“No matter what hardware DOE buys, Kokkos-based software is going to be ready to run on that platform on the first day that platform is available. That is what we offer our users, that is what we offer the larger HPC community, and that’s what we are striving to achieve,” Trott said.

On the non-technical side, they are also building up the Kokkos user community — the 1,400 developers registered to the Kokkos support channel on Slack represent over 150 different institutions from around the world. Most recently, the French Energy Regulatory Commission committed to using Kokkos for programming on its future exascale platforms.

Meanwhile, the Kokkos project’s institutional collaborators have grown beyond ORNL and Sandia to include Argonne, the National Energy Research Scientific Computing Center at Lawrence Berkeley National Laboratory, the University of Texas and the Swiss National Supercomputing Centre.

“The ECP really fueled the expansion of Kokkos into a community project, reaching a larger audience beyond just Sandia and a couple of our collaborators and truly making the team multi-institutional. That has been, I think, one of the big successes of the ECP — we collaborated on one really good solution instead of everybody coming up with their own little way of doing things,” Trott said.

Support for this research came from the ECP, a collaborative effort of the DOE Office of Science and the National Nuclear Security Administration, and from the DOE Office of Science’s Advanced Scientific Computing Research program. The OLCF is a DOE Office of Science user facility.

UT-Battelle LLC manages ORNL for DOE’s Office of Science, the single largest supporter of basic research in the physical sciences in the United States. The Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit https://energy.gov/science.