Small cluster systems help staff evaluate Summit’s planned architecture
In preparation for Summit, the next supercomputer at the Oak Ridge Leadership Computing Facility (OLCF), a US Department of Energy (DOE) Office of Science User Facility, computer scientists and support staff are leaning on two modest-but-mighty test systems to explore Summit’s cutting-edge architecture.
Pike and Crest, part of the early test bed for Summit built by OLCF staff late last year, are two simple, yet realistic, systems designed to evaluate different aspects of the hybrid CPU–GPU computing architecture. They are the first of multiple test and development systems that will be evaluated as new hardware and software are released in the run-up to Summit’s 2018 arrival at DOE’s Oak Ridge National Laboratory. By testing the performance of Pike and Crest, staff and vendors can learn the ins and outs of the new machine and have an opportunity to identify and fix problems so Summit is ready for science on day one.
Both systems are small clusters powered by IBM Power8 CPUs, a precursor to the CPUs that will be used on Summit. Peripheral components differ, however, based on the function of each system, according to Dustin Leverman, a member of the OLCF High-Performance Computing Operations (HPC Ops) Group that helped assemble the early test bed.
“Crest is a compute test bed. Each of its four compute nodes contain four GPUs,” Leverman said. “Pike, on the other hand, is a data storage test bed of 14 nodes. Instead of GPUs, it has a non-volatile memory disk to evaluate potential attributes of the high-speed data storage system planned for Summit. These two systems give us a head start on Summit’s next-generation compute and storage systems so we will be better prepared to support users.”
In a change from past OLCF systems, Summit’s data will be stored on a vendor-designed platform, IBM’s Elastic Storage System (ESS). ESS is based on IBM’s General Parallel File System technology, which brings a different approach to parallel storage compared with the OLCF’s current Lustre file systems.
“We need to understand these differences before we put the system in production,” he said.
By using Pike to run benchmark jobs on an ESS unit, OLCF staff will be able to evaluate the storage system for various attributes, including metadata performance, block I/O, random/sequential performance, and data management, among other attributes.
Sister system Crest also is being put through its paces as staff scale up scientific applications and test early versions of software, a key step to finding and fixing bugs before products are released.
“We’re checking out compilers and building and running codes; that’s a good outcome of this,” said HPC Ops system administrator Don Maxwell, the team lead for Crest. “We will also begin using Crest to test new software that IBM is developing for Summit to ensure it meets our requirements.”
Future test systems will incorporate NVLink, a new high-bandwidth interconnect from NVIDIA that will dramatically speed up data movement between CPUs and GPUs. —by Jonathan Hines
Oak Ridge National Laboratory is supported by the US Department of Energy’s Office of Science. The single largest supporter of basic research in the physical sciences in the United States, the Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.