X-Stack Projects Use Titan’s GPUs for Demos

The latest X-Stack PI Meeting, held April 6 – 7 at LBNL, included two demos developed with Titan’s GPUs. Craig Rasmussen and Louis-Noel Pouchet, members of the D-TEC project, demoed their software prototypes employing DSL technology.

Project teams demonstrate their research in programming environments for future exascale systems

One of the unique aspects of the Oak Ridge Leadership Computing Facility’s (OLCF’s) Titan supercomputer is its hybrid architecture, consisting of both CPUs and GPUs. Recently, those GPUs played a big role in helping advance US Department of Energy (DOE)-funded scientific discovery.

Researchers came together to use Titan’s GPUs for two X-Stack project teams to develop demonstrations for their research prototypes, which addressed domain-specific languages (DSLs)—computer languages specialized to a particular application domain—in the exascale era.

X-Stack is a multi-institution program in DOE’s Office of Advanced Scientific Computing Research (ASCR) portfolio that supports research on significant advances in programming models, languages, compilers, runtime systems, and tools for extreme-scale computing.

“This initiative supports several projects researching different aspects of next-generation programming models and environments,” said David Bernholdt, group leader for the Computer Science Research Group at the OLCF, a DOE Office of Science User Facility located at DOE’s Oak Ridge National Laboratory. “They’re looking beyond the tools like OpenMP and OpenACC that we typically use today.”

Bernholdt is the lead for programming environments and tools for the OLCF, so when X-Stack project members requested access on Titan, it was natural for Bernholdt to accommodate them under the umbrella of his group’s work.

“Demos for other projects in the X-Stack program requiring only CPU-based systems were hosted by NERSC and other facilities,” Bernholdt said. “But since some projects needed GPUs, they came to us. We were able to give them accounts quickly, allowing them to set up and run their demos in fairly short order.”

Craig Rasmussen of the University of Oregon and Louis-Noel Pouchet of the Ohio State University both presented demos at the X-Stack Principal Investigator (PI) Meeting last month at Lawrence Berkeley National Laboratory (LBNL). Rasmussen and Pouchet are members of the DSL Technology for Exascale Computing (D-TEC) project.

D-TEC, a 4-year project funded by the DOE ASCR program, is a multi-institution effort led by Dan Quinlan at Lawrence Livermore National Laboratory (LLNL) and Saman Amarasinghe at the Massachusetts Institute of Technology (MIT), with coinvestigators from LLNL, MIT, LBNL, IBM, Rice University, Ohio State, the University of Oregon, and the University of California, San Diego. It is one of four large-scale projects in the X-Stack portfolio that launched in 2012.

DSLs define high-level abstractions that make the development of application code more efficient; high-level abstractions improve productivity and enable domain-specific performance and energy optimizations. However, leveraging DSLs efficiently is difficult because of lengthy design and development of the corresponding software stack support, including languages, compilers, runtime, and tools.

The goal of D-TEC is to make DSLs effective for exascale, enabling the support of both embedded and general DSLs and addressing all layers of the exascale software stack—software infrastructure for DSL design and implementation, domain-specific optimizing compilers, and runtime support.

In his X-Stack PI Meeting presentation, titled “Automatic GPU Code Generation for 2-D/3-D Stencils on Regular Grids,” Pouchet demonstrated how to use a DSL and compiler for stencils—a common computing pattern—to generate GPU-specific code automatically from a simple, high-level description of a numerical operation. This significantly improves the programmer’s productivity, avoiding the need for tedious and error-prone development of target-specific implementation of a computation. In fact, from a high-level description of the stencil computation using the NVIDIA Forma DSL, the team’s optimizing compiler implements specialized transformations for stencils on GPUs, achieving three times better performance than the base code generated by Forma’s compiler.

Rasmussen discussed “CAFe: A Unified PGAS Programming Model for Heterogeneous Computing,” in which he demonstrated a parallel implementation of Dijkstra’s shortest path algorithm solving 3-D ray paths in the Earth’s crust using GPU nodes on Titan. The code was written in an extended version of the Fortran programming language, which is important because the Fortran language extensions allow a single parallel programming model to be applied across an entire parallel machine. Rasmussen demonstrated new language extensions that will greatly simplify programming parallel machines with multiple GPUs per node.

Both demos employed DSL technology developed under D-TEC to allow computer languages to be extended so that the resulting code can be transformed automatically to run efficiently and correctly for specific hardware architectures. The automatic transformations applied are high-level representations that can be changed if the application is to run on a different architecture, thus allowing a code to be portable across multiple machine architectures and maintain good performance characteristics.

Sonia Sachs, program manager of the X-Stack program, wanted to reach out to various computing facilities to garner interest and engagement among the projects, facilities, and, ultimately, application developers, such as many OLCF users. Matt Norman and Gustav Jansen, OLCF computational scientists, and Bernholdt attended for the OLCF.

“These projects look to the future as far as what programming environments will best support future extreme-scale systems, including Summit and beyond,” Bernholdt said. “The impact on users is not immediate, but we will work with users and project teams to make these tools available on our systems so they can take a deeper look at their application codes. The OLCF, as a matter of practice, tries to support our users by providing research tools or connecting researchers with these kinds of tools and application experts. The X-Stack program provides a preview of programming environments to come, so we want to provide this information and these sorts of opportunities to our users.”

Oak Ridge National Laboratory is supported by the US Department of Energy’s Office of Science. The single largest supporter of basic research in the physical sciences in the United States, the Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.