HPC programmers gather for year’s final GPU acceleration workshop
It felt like déjà vu: the same sights and sounds of humming hard drives, keyboard-punching programmers, and their frustration-to-elation faces after achieving a breakthrough—except this time there were more people, more laptops, and a lot more coffee.
This year’s hackathon, hosted by the Oak Ridge Leadership Computing Facility (OLCF)—a US Department of Energy (DOE) Office of Science User Facility located at DOE’s Oak Ridge National Laboratory—took place October 19–23 at the Marriott hotel in downtown Knoxville. The event marked the last of three major OLCF-involved hackathons held in 2015, where programmers from around the world gather to gain application GPU portability expertise straight from the experts. The earlier events were held at the National Center for Supercomputing Applications (NCSA) located at the University of Illinois at Urbana-Champaign and the Swiss National Supercomputing Centre (CSCS) in Switzerland.
“Our primary goal is to continue to grow the ecosystem, or the number of accelerated applications, and we’ve come a long way since hosting the very first hackathon 1 year ago,” said event coordinator and OLCF high-performance computing (HPC) programmer Fernanda Foertter. “Since then we’ve seen a significant improvement with each one, from the year’s first event at NCSA, to a few months later at CSCS, and now this one here at the OLCF. We have almost doubled the attendance number from what we had a year ago, and the buzz throughout the community has been tremendous.”
This year’s event totaled more than 70 attendees—with a growth from six teams to nine—to include a more diversified, wider range of institutions and science disciplines. Also of note, five of the nine teams are already current OLCF users, with four of those enrolled in projects within the Center for Accelerated Application Readiness (CAAR) program.
The nine programming teams were:
- AstroGPU, astronomy and astrophysics (CAAR application)
- GFDL, multiphysics climate model
- HACME (ACME), GPU accelerated climate model (CAAR application)
- NUMA, next-generation weather prediction model for the US Navy
- NUCCOR, nuclear physics (CAAR application)
- PETTT_SAMI3, two-dimensional physics-based ionosphere model
- STEPS, stochastic process, Markov chain simulator
- Urban Smart, urban transportation model
- XGC, multiphysics magnetic fusion reactor simulator (CAAR application)
Much of the continued success of these events, Foertter noted, comes from the dedication the vendor partners and mentors have shown from day one. Following the formula for success, each of the nine scientific programming teams was matched with mentors from vendors IBM, Cray, The Portland Group, and NVIDIA. Also participating were mentors from the OLCF’s User Assistance and Scientific Computing groups (SciComp), CSCS, and Cornell University.
“After several hackathons, I think we mentors know how to make things work for our teams. We know how to take care of many problems that the application teams might face, and we know how to identify new problems quickly,” said Markus Wetzstein, a mentor from CSCS who has been involved since the first hackathon. “We know all the workarounds, so we can help people much faster—solving the problems that they run into—and we know which kinds of approaches work very well together with OpenACC, and which ones don’t.”
Another “from day one” mentor was SciComp’s Matt Norman. “These full immersion events simultaneously help code developers identify their greatest challenges in porting their codes to GPUs, give developers confidence with using OpenACC in real-world code, and improve the compiler implementations by teasing out bugs with test cases that would never have been used otherwise.”
As usual, the hackathon workflow in the first few days consisted of learning new portability methods and experimenting with new programming languages, such as MPI, OpenMP, CUDA, Fortran, and OpenACC. At the midpoint of each day, participants took a timeout from pouring through millions of lines of code to deliver progress reports and announce any new challenges or achievements.
Encountering problems during a hackathon is to be expected, but doing something for the first time, like learning a new programming language, can still be frustrating—however, the results are well worth it. Frank Giraldo from the Naval Postgraduate School found this to be especially true with his code NUMA, after finding some success using OpenACC for the first time.
“It was a little tough, and at first we were copying too much data around that we didn’t have to. Finally we were able to get to the point where we were actually getting faster computation times on the GPU versus the CPU,” he said. “Now I think we are probably at the point [using a single GPU] where we can be as fast as 16 CPUs. On these kinds of machines, the computations are fast. It’s the data transfer that kills you. Once you learn how to handle that, then you can really optimize your code, and that’s what I learned here.”
The lessons learned from this year’s hackathons will aid programmers for years to come as they inevitably have to adapt to newer, more diversified architectures.
A few weeks after the hackathon, Foertter shared the event results with an even greater audience at the SC15 supercomputing conference in Austin, Texas, last month. The OLCF actually rounded out the 2015 schedule with a fourth, non-GPU focused event in late November — the Big Neuron Hackathon.
“We’ve seen the benefits that hackathons bring, and we already have plans to expand the events even more in 2016,” Foertter said. “They have become an invaluable platform for so many, and we plan to keep them going for at least another year.”
Oak Ridge National Laboratory is supported by the US Department of Energy’s Office of Science. The single largest supporter of basic research in the physical sciences in the United States, the Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.