Technology - Written by on November 28, 2017

Teams Gear up for Summit at Fourth Annual GPU Hackathon

Tags: , , , ,

Programmers, vendors, developers, and domain experts attended the OLCF’s 2017 GPU Hackathon during the week of October 9 to adapt scientific applications for large parallel computing resources such as the supercomputers at the OLCF.

Ten teams work to scale their codes on Summitdev at yearly OLCF event

At the Oak Ridge Leadership Computing Facility’s (OLCF’s) fourth annual GPU Hackathon, event programmers again successfully adapted their applications for GPU architectures. The 5-day event at the Hilton in Knoxville, Tennessee, took place the week of October 9 during the installation of the OLCF’s next flagship supercomputer, Summit.

The event drew 77 attendees—on 10 teams—from institutions such as the University of Georgia, Booz Allen Hamilton, Georgia Tech, Los Alamos National Laboratory (LANL), and the University of California–Santa Barbara. Teams from the University of Tennessee–Knoxville and the US Department of Energy’s (DOE’s) Oak Ridge National Laboratory (ORNL) also participated.

This hackathon was the first at which teams had the opportunity to port their applications to Summitdev, the OLCF’s test bed system for Summit. Summitdev contains IBM POWER8 CPUs and NVIDIA’s Tesla GPUs, whereas Summit will feature IBM’s POWER9 architecture and NVIDIA’s Volta GPUs.

“This hackathon deliberately targeted teams for Summit,” said Fernanda Foertter, hackathon organizer and high-performance computing (HPC) data scientist at the OLCF. “By the next hackathon, Summit should be available to users, so we are preparing them for that architecture.” The Summit supercomputer will have at least five times the application performance of the OLCF’s current Cray XK7 Titan supercomputer, so scaling important scientific codes for Summit’s architecture is crucial.

Hackathons at the OLCF, a DOE Office of Science User Facility at ORNL, place a strong focus on teamwork and collaboration. At the 2017 hackathon, teams experienced this sense of collaboration as they worked together to get their codes up and running on faster, more complex hardware. Fortunately, experts from NVIDIA, IBM, PGI, ORNL, and universities provide mentorship to participants at these events.

“Part of what’s so great about these hackathons is that the mentors are there to say, ‘It’s okay, we are going to struggle through this together.’ Everyone is working hard and helping one another,” Foertter said. “At the end of the week, the mentors and I want the teams to be able to return to their institutions and continue optimizing these codes independently.”

The teams were overwhelmingly successful in porting their codes to GPU architectures this year, Foertter said, noting that the LANL team—which worked on the threaded, vector particle-in-cell (VPIC) code—had a particularly favorable experience. VPIC is a plasma physics application optimized to run on Intel’s Knights Landing processors. To leverage the GPUs in hybrid architectures such as Titan and Summitdev, the team used Kokkos, a library designed to target complex HPC node architectures for performance portability. Kokkos can run on any system and adapt codes for specific hardware—at this hackathon, GPU architectures.

Adam Simpson, an HPC user support specialist at the OLCF, and Daniel Ibanez from Sandia National Laboratories served as mentors for the LANL team. At the end of the hackathon, the team successfully ran one of the algorithms in VPIC on NVIDIA GPUs, a major step in furthering the portability of the code for different HPC architectures. The hackathon gave the team the tools it needed to begin working on increasing the performance of the code in the future, Foertter said.

“Having direct access to the experts allowed us to immediately tackle problems head-on, avoiding getting caught up in any pitfalls or stalling at roadblocks,” said Bob Bird, VPIC code developer at LANL. “The co-design process between domain experts, computer scientists, and tool developers allowed for cycles of rapid feedback and development, uniquely allowing us to identify areas for enhancement in both VPIC and Kokkos.”

The event was the last of five GPU hackathons in which the OLCF participated this year. Others took place at Jülich Supercomputing Centre, Brookhaven National Laboratory, the National Aeronautics and Space Administration’s Langley Research Center, and the Swiss National Supercomputing Centre, with teams experiencing similar success in porting and optimizing their codes for GPU architectures. The OLCF partners with multiple centers each year to bring GPU hackathons to sites throughout the United States.

“Sometimes, teams tell us that they have accomplished here what has taken months of trying,” Foertter said. “The collaborative atmosphere between mentors and other teams, where everyone is helping one another, is what ultimately makes these events so successful.”

Oak Ridge National Laboratory is supported by the US Department of Energy’s Office of Science. The single largest supporter of basic research in the physical sciences in the United States, the Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.