Technology - Written by on September 28, 2016

OLCF Team Resolves Performance Bottleneck in OpenACC Code

Tags: , , , ,

By improving its MAESTRO code, a team led by Michael Zingale of Stony Brook University is modeling astrophysical phenomena with improved fidelity. Pictured above, a three-dimensional simulation of Type I x-ray bursts, a recurring explosive event triggered by the buildup of hydrogen and helium on the surface of a neutron star.

By improving its MAESTRO code, a team led by Michael Zingale of Stony Brook University is modeling astrophysical phenomena with improved fidelity. Pictured above, a three-dimensional simulation of Type I x-ray bursts, a recurring explosive event triggered by the buildup of hydrogen and helium on the surface of a neutron star.

Team achieves 6x performance improvement for portion of code, enhances speed and capability

For any high-performance computing code, the best performance is both highly effective and highly efficient, using little power but producing high-quality results. However, performance bottlenecks can arise within these codes, which can hinder projects and require researchers to search for the underlying problem.

A team at the Oak Ridge Leadership Computing Facility (OLCF), a US Department of Energy (DOE) Office of Science User Facility located at DOE’s Oak Ridge National Laboratory, recently addressed a performance bottleneck in one portion of an OLCF user’s application. Because of its efforts, the user’s team saw a sixfold performance improvement in the code. Team members for this project include Frank Winkler (OLCF), Oscar Hernandez (OLCF), Adam Jacobs (Stony Brook University), Jeff Larkin (NVIDIA), and Robert Dietrich (Dresden University of Technology).

“If the code runs faster, then you need less energy. Everything is better, more efficient,” said Winkler, performance tools specialist at the OLCF. “That’s why we have performance analysis tools.”

Known as MAESTRO, the astrophysics code in question models the burning of exploding stars and other stellar phenomena. Such modeling is possible because of the code’s OpenACC configuration, an approach meant to simplify the programming of CPU and GPU systems. The OLCF team worked specifically with the piece of the algorithm that models the physics of nuclear burning.

Initially that portion of MAESTRO did not perform as well as expected because the GPUs could not quickly access the data. To remedy the situation the team used diagnostic analysis tools to discover the reason for the delay. Winkler explained that Score-P, a performance measurement tool, traces the application, whereas VAMPIR, a performance visualization tool, conceptualizes the trace file, allowing users to see a timeline of activity within a code.

“When you trace the code, you record each significant event in sequence,” Winkler said.

By analyzing the results the team found that although data moving from CPUs to GPUs performed adequately, the code was significantly slower when sending data from GPUs to CPUs. Larkin, an NVIDIA software engineer, suggested using a compiler flag—custom instructions that modify how programming commands are expressed in code—to store data in a more convenient location for the GPUs, which resulted in the code’s dramatic speedup.

Jacobs, an astrophysicist working on a PhD at Stony Brook, brought the OpenACC code to the OLCF in June to get expert assistance. Jacobs is a member of a research group led by Michael Zingale, also of Stony Brook.

During the week Jacobs spent at the OLCF, the team ran MAESTRO on the Titan supercomputer, the OLCF’s flagship hybrid system. By leveraging tools like Score-P and VAMPIR on this system, the team employed problem-solving skills and computational analysis to resolve the bottleneck—and did so after just a week of working with the code. Both Winkler and Jacobs stressed that their rapid success depended on collaboration; the individuals involved, as well as the OLCF, provided the necessary knowledge and resources to reach a mutually beneficial outcome.

“We are working with technology in a way that was not possible a year ago,” Jacobs said. “I am so grateful that the OLCF hosted me and gave me their time and experience.”

Because of these improvements, the MAESTRO code can run the latest nuclear burning models faster and perform higher-level physics than before—capabilities that are vital to computational astrophysicists’ investigation of astronomical events like supernovas and x-ray bursts.

“There are two main benefits to this performance improvement,” Jacobs said. “First, your code is now getting to a solution faster, and second, you can now spend a similar amount of time working on something much more complicated.”

Oak Ridge National Laboratory is supported by the US Department of Energy’s Office of Science. The single largest supporter of basic research in the physical sciences in the United States, the Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.