Two virtual workshops prepared OLCF users to port their codes and test them for Frontier’s new exascale architecture
As the world of high-performance computing (HPC) marches ever closer to entering the exascale era of supercomputers exceeding a billion billion, or 1018, floating point operations per second, anticipation among computational scientists is palpable—even during virtual workshops. Two recent seminars copresented by AMD, Hewlett Packard Enterprise (HPE), which now incorporates Cray, and the US Department of Energy’s (DOE’s) Oak Ridge Leadership Computing Facility (OLCF) gathered users who are preparing for the Frontier supercomputer that’s currently being installed at Oak Ridge National Laboratory (ORNL).
Over 400 developers from across the country—representing academic, industrial, and federal institutions—logged into Zoom this past month to learn how to formulate codes and test them for Frontier, which is slated for full user operation in 2022.
“It’s very exciting for me to be here,” said Gina Sitaraman, an AMD engineer introducing AMD’s Heterogeneous Interface for Portability (HIP) during the May 24–26 online workshop. “This is one of the most exciting things I have done in my life, and I hope it is for you!”
HIP is an open-source C++ run time application programming interface and programming language that allows developers to create applications that can run on AMD and NVIDIA platforms, which are the predominant supercomputer chip technologies today. It will also be a key tool for porting preexisting codes that have already been optimized for NVIDIA GPU accelerators to run on AMD GPU accelerators.
HIP is compatible with Compute Unified Device Architecture (CUDA), NVIDIA’s proprietary programming model used to optimize codes to run on its GPUs. For current OLCF users who want to take advantage of Frontier’s new levels of compute power, HIP will provide a simple way to transition their CUDA codes from Summit’s NVIDIA Volta V100 GPUs to Frontier’s AMD Instinct™ GPUs.
“HIP is going to be essential for a lot of codes as we move to Frontier. Basically, if a code is coming from Summit and it already makes good use of CUDA, the HIP porting tools make it easy and straightforward to get your code running and performant on Frontier,” said Bronson Messer, the OLCF’s director of science.
Matt Norman, leader of the ORNL’s Advanced Computing for Life Sciences and Engineering group, has been working to prepare the Energy Exascale Earth System Model code for Frontier. He has been using HIP indirectly through a C++ performance portability library called Yet Another Kernel Launcher (YAKL). YAKL has Array functionality and a limited intrinsics library that allows user-level code to look very similar to Fortran code, which makes porting Fortran code to C++ portability much easier, he said.
“When we looked at the look and feel of CUDA and HIP code, it became clear that a simple C++ abstraction layer was the best fit for our code to have a single source and perform well on multiple GPU hardware back ends,” Norman said. “In our case, HIP was an easy add to the back end of our C++ portability framework due to its similarity to CUDA. This aspect made it fairly easy for us to get onto AMD GPU hardware quickly without having to change our source code.”
But once users have their codes optimized and ready to run, how can they test them if Frontier is not actually operational yet?
As an interim solution, HPE has assembled an early access system based on the HPE Apollo 6500 Gen10 Plus system at ORNL called Spock, which serves as a “mini-Frontier” that features similar, albeit less powerful, components: AMD EPYC™ CPUs and AMD Instinct™ GPUs. Spock and Frontier will also run the same HPE Cray EX software stack, including critical components such as the operating system and the HPE Cray Programming Environment with compiler and MPI. But there are some differences, too.
“Most notably, there’s a few orders of magnitude less total-system computational power for Spock,” said Noah Reddell, HPE’s Centers of Excellence manager, who led the May 20 Spock Training workshop. “The AMD CPUs and GPUs will be more advanced in Frontier than what we have today. And there will be a few additional months of software development and updates that will happen between now with Spock’s availability and Frontier’s delivery to Oak Ridge.”
Spock is open to all application development and software technology teams inside the Exascale Computing Project, as well as all Frontier Center for Accelerated Application Readiness program teams. Since the workshop, Messer said he’s seen a big uptick in users logging onto Spock and running codes, which is vital for their eventual success on Frontier.
“Without a platform where you can actually test things, you’re sort of programming in the dark,” Messer said. “Having a platform like Spock is absolutely essential to make progress because in any implementation of something like a programming model or runtime system like HIP, you must have a platform that actually makes use of it and will lead to a result.”
Visit the OLCF training calendar to stay aware of upcoming workshops.
UT-Battelle LLC manages Oak Ridge National Laboratory for DOE’s Office of Science, the single largest supporter of basic research in the physical sciences in the United States. DOE’s Office of Science is working to address some of the most pressing challenges of our time. For more information, visit https://energy.gov/science.