New York hackathon reveals bugs, opportunities for improvement

As the Oak Ridge Leadership Computing Facility (OLCF) transitions from its current flagship supercomputer, Titan, to its next-generation supercomputer, Summit, staff and users are preparing to get the most out of their codes on Summit, which will be built with an IBM POWER9 architecture and NVIDIA Volta GPUs.

One of the most important considerations when transitioning between architectures is the compatibility between a machine’s parallelization software and its compilers, which are programs that translate programming languages into code. To test an early version of Summit’s compilers—including IBM’s XL compiler suite—Oscar Hernandez, tools developer in the OLCF Computer Science Research Group, and Tom Papatheodore, former distinguished postdoctoral research associate for the OLCF Scientific Computing Group and current NVIDIA solutions architect, participated in an IBM hackathon for OpenMP 4.5, a directive-based application programming interface for parallel programming on both CPUs and GPUs.

Open to members of the Collaboration of Oak Ridge, Argonne, and Livermore (CORAL), the hackathon took place at the IBM Thomas J. Watson Research Center in Yorktown Heights, New York, from September 13 to 16. More than 30 people attended, including representatives from the national laboratories and compiler, run-time, and application experts from IBM.

Hernandez and Papatheodore brought to the event Fortran- and C-based codes parallelized to run on Titan with the programming models OpenACC and CUDA. Because IBM compilers are optimized for POWER and GPU architectures, Summit will contain IBM compilers that support OpenMP 4.5 and compilers from the Portland Group, Inc. that support Fortran and CUDA.

Running current scientific application codes on the new compilers is important for gauging and ensuring the future success of Summit. At the hackathon participants tested the Low Level Virtual Machine (LLVM) open source compiler and IBM’s XL compiler suite—both of which support Fortran- and C-based languages—to identify bugs and opportunities for performance improvement in OpenMP 4.5.

Hernandez ported part of HACC, one of the CORAL benchmark codes, to OpenMP 4.5 and worked with the IBM team to improve the support for OpenMP 4.5 on the C-language LLVM compiler. HACC is one of 13 projects in the OLCF’s Center for Accelerated Application Readiness (CAAR) program that teams are currently preparing to run on Summit when it comes online in 2018. The OLCF is a US Department of Energy (DOE) Office of Science User Facility located at DOE’s Oak Ridge National Laboratory.

“People go to the hackathon with the expectation of breaking the compiler and providing valuable feedback to the developers,” Hernandez said. “Everyone is happy to provide feedback so that developers know what works and what doesn’t—whether that be issues with the application code, the OpenMP 4.5 specification itself, or its compiler implementations.”

Whereas some teams encountered bugs in the compilers, others successfully obtained performance boosts in their codes. For instance, using the XL Fortran compiler, Papatheodore observed a modest speedup when using OpenMP 4.5 to offload work to GPUs in place of OpenACC for testing the performance of a physics module extracted from the FLASH code. FLASH, a publicly-available high-performance application code used to simulate supernovae explosions in astrophysics, is also one of the OLCF’s 13 CAAR projects.

“We want these codes to be ready to run on day one,” said Papatheodore. “We don’t want it to be a situation where the computer is here and no one has code that can run on it due to its architectural specifications. We don’t want any lag time.”

Hernandez said some performance issues observed at the hackathon will require an in-depth analysis of the compilers. IBM meets regularly with OLCF teams to discuss further improvements that may result from information gathered at the hackathon. The OLCF also plans to host a hackathon in January to test both the compilers and new IBM hardware.

“Testing the compilers on the new architecture is the next step in the ascent to Summit,” Hernandez said.

“We need to test the compilers in terms of both functionality and how they run on different architectures. Then, we will know what we need to work on before Summit comes online. Ultimately, we want to ensure that the transition from Titan to Summit is as smooth as possible.”

Oak Ridge National Laboratory is supported by the US Department of Energy’s Office of Science. The single largest supporter of basic research in the physical sciences in the United States, the Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.