The OLCF’s fifth year hosting hackathons brings new opportunities, growth
In late October, the Oak Ridge Leadership Computing Facility (OLCF) wrapped up its 2019 hackathon series with an event hosted at the Crowne Plaza Hotel in downtown Knoxville, Tennessee, that gave 76 participants access to the world’s most powerful and smartest supercomputer, the IBM AC922 Summit at the OLCF.
Attendees from national laboratories, universities, and industry participated in the hackathon, held October 21–25. It was the 11th such event cohosted by the OLCF and NVIDIA in 2019—the most ever held in a single year by the institutions.
Over the last 5 years, the OLCF GPU hackathon has evolved from a single, ambitious endeavor to a series hosted at institutions across the country—and even around the world—to bolster programmers’ applications for scientific study on some of the largest GPU supercomputers. The OLCF and NVIDIA co-organize the trainings along with the local host institutions at which the events are held. Teams receive help from expert mentors at the 5-day events, aiming to leave with a piece of GPU code in hand.
This year, 516 people from 105 different institutions participated in the events. In total, OLCF hackathons have drawn more than 1,600 participants over the years.
“These events continue to be among the most productive and valuable trainings in our organization,” said Tom Papatheodore, a high-performance computing (HPC) engineer at the OLCF who has organized the hackathons since 2017. The OLCF is a US Department of Energy (DOE) Office of Science User Facility located at DOE’s Oak Ridge National Laboratory (ORNL).
So successful are the events that NVIDIA is now hosting more hackathons than the OLCF is able to participate in, according to Papatheodore. Although the OLCF participates in most of the events in the United States, some overseas events, for example, are hosted solely by NVIDIA.
“We started with one hackathon, and now there are more of them than we are able to be involved in,” Papatheodore said.
Value of HPC for industry
One of the largest successes of 2019 came from General Electric (GE), a team that gained major speedups in a code used to simulate turbulence—the unsteady flow of air—around the rows of blades in large gas turbines and jet engines.
The team began adapting its GENESIS code for Summit’s GPUs at the Brookhaven National Laboratory hackathon in 2018 and continued the effort at the Massachusetts Institute of Technology hackathon in June 2019. The team has achieved 50- to 300-fold speedups in pieces of its code and far greater scalability since the first event.
“We’re used to simulations that are fast but very inaccurate,” Carlos Velez, lead research engineer in the Thermo-Sciences Organization at GE’s Global Research Center, said. “Now we can get better predictive accuracy and higher resolution, which are needed for the kind of physics we’re trying to resolve. And we do it within a run time that fits very nicely within our design cycle.”
The team’s success demonstrates the enormous value in GPU computing.
Another team from industry that participated this year was Pratt & Whitney, led by Pete Bradley, a Pratt & Whitney fellow for HPC and modeling. Bradley’s team worked with expert mentors Dave Norton of NVIDIA and Matt Norman of ORNL to port one of the most computationally expensive pieces of the United Technologies Computational Fluid Dynamics (UTCFD) code to Summit. It was the team’s first time attending an OLCF GPU hackathon.
“Matt’s deep experience with CFD allowed us to set a path forward for our code,” Bradley said. “We haven’t encountered someone with such tailored experience for our problem in any previous events or opportunities.”
Because Bradley’s team earned a Director’s Discretionary (DD) allocation on Summit earlier this year, they were able to seamlessly transition to their ongoing work and begin immediately implementing what they learned at the hackathon. The team emphasized the value of using GPUs to accelerate UTCFD for more efficient jet engine design.
“Performance of our code is very important, both in terms of time to market as well as in terms of compute capacity consumption,” Bradley said. “At the same time, we are working with very challenging physics, and our customers are constantly demanding better fuel burn and emissions. Being able to focus both on engineering productivity as well as efficient engine design is critical for our competitiveness.”
Papatheodore expects increasing participation from industry next year as more teams begin to reap the benefits.
“We’ve had many calls with these teams, and I think their participation could really manifest in the next year or so,” Papatheodore said. “I think the full impact on industry is still yet to be seen.”
A pipeline to success
For many teams, the OLCF hackathons are a conduit to other opportunities at the leadership computing facility. One particularly successful project team is led by Vikram Gavini, associate professor of mechanical engineering at the University of Michigan. At an OLCF hackathon in 2018, the team first ported part of their CPU-only code to GPUs with the help of expert mentors.
After the hackathon, the team spent 6 months porting 90 percent of their code, Density Functional Theory with Finite Elements (DFT-FE), to Summit after earning time on the machine through a DD allocation. Eventually, the team used the GPU-accelerated version of DFT-FE to simulate a magnesium dislocation system consisting of 10,000 atoms at high fidelity, which earned them a Gordon Bell finalist nomination. Information gleaned from these simulations can guide experimentalists as they add new elements to magnesium alloys to reduce the possibility of cracking.
The team has since been awarded 360,000 Summit node-hours through an Innovative and Novel Computational Impact on Theory and Experiment (INCITE) allocation to continue their study of dislocation energetics in lightweight structural materials, which could lead to better lightweight alloys.
“Within a year, this team went from having no GPU code at all to being awarded an INCITE allocation,” Papatheodore said. “This significant progression really started from the hackathons themselves.”
The schedule for next year’s hackathons and training events will be available in early 2020.
UT-Battelle LLC manages Oak Ridge National Laboratory for DOE’s Office of Science, the single largest supporter of basic research in the physical sciences in the United States. DOE’s Office of Science is working to address some of the most pressing challenges of our time. For more information, visit https://energy.gov/science.