Arm-powered test bed connects researchers to budding HPC ecosystem
Concocting the next big thing in high-performance computing (HPC) often starts with giving the latest thing a try. In that spirit, the Oak Ridge Leadership Computing Facility (OLCF) recently deployed Wombat, the center’s newest Arm-based test bed.
The 16-node, single-rack computing cluster is the OLCF’s latest foray into Arm technology, which Arm Holdings produces and openly licenses to hardware vendors. Last year, the OLCF, a US Department of Energy (DOE) Office of Science User Facility located at DOE’s Oak Ridge National Laboratory (ORNL), deployed an early development system featuring Arm processors called Arm1.
Wombat, a prototype platform built by Hewlett Packard Enterprise (HPE), features Cavium ThunderX2 processors designed for memory-intensive workloads. Additionally, the system contains 4 nodes equipped with AMD GPU accelerators. The test system arrived in January and began accepting users in May. Because the Cavium processors perform on par with other high-performance computing (HPC) processors such as Intel’s x86 and IBM’s POWER, researchers are keen to port their scientific codes to the new architecture and see the results.
“We want to essentially evaluate the software stack and see where we are at,” said Graham Lopez, an HPC engineer in the OLCF’s user assistance and outreach group. Lopez is interested in porting and benchmarking a materials science code called QMCPACK to Wombat. “Arm is very new to us in HPC, so we want to see what the cost of adoption is.”
Another draw to the system is an Arm-based technology called Scalable Vector Extension (SVE), which allows for greater flexibility in the size of vectors—strings of numbers treated as a coherent unit in memory—that can be processed by hardware. The unique feature potentially creates more opportunities for users to exploit parallelism in their codes.
“What’s unique and interesting about SVE isn’t so much that it can process vectors as short as 128 bits and as long as 2,048 bits but that it supports multiple vector lengths without forcing users to recompile their code,” said Ross Miller, systems integration programmer in the OLCF’s Technology Integration Group.
Though Cavium processors do not yet directly support SVE, Wombat’s users can still gain insight into the technology by emulating, or imitating, SVE instructions on the system. “If you want to evaluate how well your code can parallelize with SVE, the emulator will output some useful metrics,” Miller said.
Wombat also serves as an ideal canvas to evaluate experimental extreme-scale architectures. Matt Baker, a member of the OLCF’s Computer Science Research Group, plans to do exactly that, running a simulator called gem5 to explore Arm architectures that do not yet exist. The work not only helps guide research into exascale and post-exascale hardware but also gives users the opportunity to develop best practices for solving scientific problems in new ways and to provide constructive feedback to vendors.
“With the huge Arm ecosystem that exists, Wombat gives us a chance to test out and contribute back into that ecosystem,” Baker said. “That may be in the form of learning what size of vector extension is most useful for certain applications or identifying additional capabilities that would benefit HPC users in future hardware releases.”
Though a relative newcomer to the HPC landscape, Arm is already making inroads at major HPC centers worldwide. Arm processors soon will power the United Kingdom’s Isambard supercomputer and Japan’s Post-K exascale machine. ORNL is one of several DOE national laboratories exploring the Arm architecture. That group includes Sandia National Laboratories, which recently announced plans for a new Arm-powered supercomputer called Astra based on HPE’s Apollo 70 system.
The flurry of government spending hasn’t escaped the notice of ORNL-based research teams who want to ensure their codes are performant on multiple architectures.
“Many application teams want to be portable and be able to run at these different centers,” Lopez said. “If the opportunity to run on an Arm-based supercomputer comes up in the future, you want to be ready.”
ORNL is managed by UT-Battelle for the Department of Energy’s Office of Science. The Office of Science is the single largest supporter of basic research in the physical sciences in the United States. DOE’s Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit https://science.energy.gov.