Coders, tool developers, and HPC experts collaborate to improve applications for next-generation supercomputers
This hackathon, organized by the Oak Ridge Leadership Computing Facility (OLCF)—a US Department of Energy (DOE) Office of Science User Facility located at DOE’s Oak Ridge National Laboratory (ORNL)—took place May 2–6. The event was this year’s second OLCF-involved hackathon, where programmers from national laboratories, universities, and vendors gather to share application GPU portability expertise and learn how to program for a hybrid CPU–GPU machine like the OLCF’s Titan supercomputer. This year’s first event took place in February at the Dresden GPU Center of Excellence and Forschungszentrum Julich research center in Dresden, Germany.
“Our goal is to help application teams accelerate their codes to next-generation hybrid supercomputers like Titan and Summit,” said event coordinator and OLCF user support specialist Fernanda Foertter. “Some teams will try about everything during the 5-day event to make their program work, but they may only get a portion of their application working on a GPU. When the team goes home with a path forward, and a plan to port their application, we are happy. We want to give them the knowledge and confidence to continue their development process.”
The following six teams—three from UD and one each from the National Aeronautics and Space Administration (NASA), Brookhaven National Laboratory, and the National Cancer Institute—participated in the event:
- Cavazos-Lab, cybersecurity and malware analysis (led by John Cavazos in UD’s Department of Computer and Information Sciences)
- Kinetic Model Builders, chemical engineering/catalysis (led by Mike Klein of the UD Energy Institute)
- CAPSL, hybrid CPU–GPU implementation (led by Guang Gao in UD’s Department of Electrical and Computer Engineering)
- Fun3D, aerospace (NASA)
- NCI CBIIT, ribonucleic acid identification for visual cancer identification (National Cancer Institute)
- 5DSpeedsters, nuclear physics/lattice quantum chromodynamics (Brookhaven National Laboratory)
UD graduate students participated on the three university teams, and undergraduate students were invited to observe and ask questions. Mentors with extensive programming expertise included those who regularly work with OpenACC and staff members from Cray, NVIDIA, PGI, Cornell University, ORNL, UD, and the University of Tennessee, Knoxville. More than 35 attendees participated in the training event.
“This is the first time we hosted an event at an academic university in the United States,” Foertter said. “Not only is UD centrally located in the East Coast, but it gives us an opportunity to reach out to next-generation programmers like UD graduate students who are just beginning to use HPC [high-performance computing]. Two of the three UD teams, in fact, had never run on a supercomputer, and the team led by Klein went from not running on HPC to running HPC and getting speedup in 4 days.”
When users run into problems with a program or tool, rather than reporting the problem, their typical last-resort response is a work-around. This means that fewer bugs are being reported than actually exist. The types of bugs found at hackathons can be considered corner-case bugs because they come from novel scientific applications or applications that use the tools in interesting ways. These bugs aren’t captured in a validation suite that clears out most typical bugs.
Fortunately, participants receive assistance from mentors, including tool developers, who sit with the teams using their programs and can offer direct support to help improve applications. Besides directly reporting bugs to the tool developers, participants can also discuss additional features that they would like included in the tools; these are called requests for engineering.
“At one of the hackathons, there was one scenario where a bug was preventing a couple of teams from moving forward,” Foertter said. “By collaborating with team members, mentors, and tool developers, the tool makers were able to recompile the tool, deploy it, and fix the bug on the fly. That’s really cool. Basically, it takes the facility out of the middle in the communication chain, allowing for more rapid development.”
One of the event co-organizers noted the importance of successful hackathons. “This GPU Hackathon established a unique blend of industry skills and academic research,” said Sunita Chandrasekaran, assistant professor of computer science at UD. “It was thrilling to see how the teams made such quick progress accelerating their scientific applications on Titan in just 5 days. Hats off to all the mentors! This wouldn’t have been possible if not for this training event. We need more of these hackathons.”
For more information, please visit: https://www.olcf.ornl.gov/training-event/2016-gpu-hackathons/.
Oak Ridge National Laboratory is supported by the US Department of Energy’s Office of Science. The single largest supporter of basic research in the physical sciences in the United States, the Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.