Optimizing Miniapps for Better Portability

Minisweep performs a “sweep” computation across a grid (pictured)—representative of a 3D volume in space—to calculate the positions, energies, and flows of neutrons in a nuclear reactor. The yellow cube marks the beginning location of the sweep. The green cubes are dependent upon information from the yellow cube, the blue cubes are dependent upon information from the green cubes, and so forth. In practice, sweeps are performed from each of the eight corners of the cube simultaneously.

When scientists run their scientific applications on massive supercomputers, the last thing they want to worry about is optimizing their codes for new architectures. Computer scientist Sunita Chandrasekaran at the University of Delaware is taking steps to make sure they don’t have a reason to worry.

Chandrasekaran collaborates with a team at the US Department of Energy’s (DOE’s) Oak Ridge National Laboratory (ORNL) to optimize miniapps, smaller pieces of large applications that can be extracted and fine-tuned to run on GPU architectures. Chandrasekaran and her PhD student, Robert Searles, have taken on the task of porting (adapting) one such miniapp, Minisweep, to OpenACC—a directive-based programming model that allows users to run a code on multiple computing platforms without having to change or rewrite it.

Minisweep is particularly important because it represents approximately 80–99 percent of the computation time of Denovo, a 3D code for radiation transport in nuclear reactors being used in a current DOE Innovative and Novel Computational Impact on Theory and Experiment, or INCITE, project. Minisweep is also being used in benchmarking for the Oak Ridge Leadership Computing Facility’s (OLCF’s) new Summit supercomputer. Summit is scheduled to be in full production in 2019 and will be the next leadership-class system at the OLCF, a DOE Office of Science User Facility located at ORNL.

Created from Denovo by OLCF computational scientist Wayne Joubert, Minisweep works by “sweeping” diagonally across grid cells that represent points in space, allowing it to track the positions, flows, and energies of neutrons in a nuclear reactor. Cubes in the grid cell represent a number of these qualities and depend on information from previous cubes in the grid.

“Scientists need to know how neutrons are flowing in a reactor because it can help them figure out how to build the radiation shield around it,” Chandrasekaran said. “Using Denovo, physicists can simulate this flow of neutrons, and with a faster code, they can compute many different configurations quickly and get their work done faster.”

Visualization of a nuclear reactor simulation on Titan.

Minisweep has already been ported to multicore platforms using the OpenMP programming interface and to GPU accelerators using the lower-level programming language CUDA. ORNL computer scientists and ORNL Miniapps Port Collaboration organizers Tiffany Mintz and Oscar Hernandez knew that porting these kinds of codes to OpenACC would equip them for use on different high-performance computing architectures.

Chandrasekaran and Searles have been using the Summit early access system, Summitdev, and the Cray XK7 Titan supercomputer at the OLCF to test Minisweep since mid-2017. Now, they’ve successfully enabled Minisweep to run on parallel architectures using OpenACC for fast execution on the targeted computer. An option to port to these types of systems without compromising performance didn’t previously exist.

Whereas the code typically sweeps in eight directions from diagonal corners of a cube inward, the team saw that with only one sweep, the OpenACC directive performed on par with CUDA.

“We saw OpenACC performing as well as CUDA on an NVIDIA Volta GPU, which is a state-of-the-art GPU card,” Searles said. “That’s huge for us to take away, because we are normally lucky to get performance that’s even 85 percent of CUDA. That one sweep consistently showed us about 0.3 or 0.4 seconds faster, which is significant at the problem size we used for measuring performance.”

Chandrasekaran and the team at ORNL will continue optimizing Minisweep to get the application up and “sweeping” from all eight corners of a grid cell. Other radiation transport applications and one for DNA sequencing may be able to take advantage of Minisweep for multiple GPU architectures such as Summit—and even exascale systems—in the future.

“I’m constantly trying to look at how I can package these kinds of tools from a user’s perspective,” Chandrasekaran said. “I take applications that are essential for these scientists’ research and try to find out how to make them more accessible. I always say: write once, reuse multiple times.”

ORNL is managed by UT-Battelle for the Department of Energy’s Office of Science, the single largest supporter of basic research in the physical sciences in the United States. DOE’s Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit https://science.energy.gov/.