A first of its kind, the hackathon targeted users aiming to run on the Oak Ridge Leadership Computing Facility’s Slate platform
The Oak Ridge Leadership Computing Facility (OLCF) hosted its first ever Slate hackathon for users of the OLCF’s Slate platform, a resource that provides container orchestration services and gives users a chance to run specialized tools and workflows that support computational campaigns. The hackathon was developed and hosted by the Platforms Group with the OLCF training board.
More than 30 people attended the event, most of whom were from various directorates at the US Department of Energy’s (DOE’s) Oak Ridge National Laboratory (ORNL). The event targeted users who intend to take advantage of Slate’s cluster compute systems.
“This is a complex system, and like all complex systems, we needed to help people understand how to use it,” said Jason Kincl, the Platforms Group leader at the National Center for Computational Sciences (NCCS), which hosts the OLCF. “This was born out of wanting to engage with more users, develop better use cases, and partner more closely with our current users.”
The NCCS Platforms Group developed training materials for the hackathon, offering self-paced, hands-on, “run-this-command” modules on GitHub. Users could jump in using existing allocations on Slate or request an allocation specifically for the hackathon. The majority of the researchers who attended had access to Marble, one of the moderate-security systems within the Slate environment. A few had access to open-environment Onyx system, which also operates within Slate.
On the first day, the attendees worked through the training modules with the help of the organizers. The second day featured three breakout sessions driven by questions and technologies that the attendees wanted to explore: (1) continuous integration, a method to automate code testing as changes are made to computational codes; (2) service accessibility, or how to integrate and access services at the NCCS at ORNL; and (3) workflow systems, including combining multiple workflow tools into a single application.
Although many beginners attended the event, the organizers were surprised by the number of experienced users who also showed up.
“We had users that were just getting started alongside what we call our ‘power users,’” Kincl said. “These users really have a need, and they have the capability and ability to go to town using this service.”
Gaurab KC and Sarat Sreepathi from the Computational Earth Sciences Group at ORNL attended the hackathon to learn how they might augment the capabilities of their monitoring framework, Performance Analytics for Computational Experiments (PACE), which summarizes performance data from computational runs. The two are members of the project that develops the Energy Exascale Earth System Model (E3SM), a high-resolution coupled-Earth system model designed to address energy-related science challenges of national interest while effectively using DOE supercomputers. Sreepathi and KC developed the PACE framework to aggregate performance data collected from E3SM experiments from various supercomputers to derive insights and identify bottlenecks and targets for performance engineering and optimization.
“We have now implemented an automatic upload capability using the Jenkins server on Slate that tracks and exports the computational climate experiments on Summit into the PACE database,” KC said. “With this capability, domain scientists can access the PACE web portal to view completed experiments, examine model configurations, and look at the performance data.”
The Platforms Group hopes to engage an even wider user base in future hackathons. They also intend to refine the use cases for Slate so they can enable the functionality that users need and create a feedback conduit for the service. Their ultimate goal is to enable users to perform their scientific research more easily.
“From my perspective, the most exciting thing about this is actually working with researchers on solving a problem and understanding the problem they’re trying to solve,” said Jeffrey Miller, a high-performance computing systems engineer in the Platforms Group who helped organize the event. “As a research engineer and systems engineer supporting researchers, it’s a different mindset, but the direct interface with the researchers is very important for me to understand what they’re trying to do.”
The hackathon was organized by Jason Kincl, Jeffrey Miller, Subil Abraham, Leah Huk, Suzanne Parete-Koon, Sherry Ray, and Dustin Spears.
UT-Battelle LLC manages Oak Ridge National Laboratory for DOE’s Office of Science, the single largest supporter of basic research in the physical sciences in the United States. DOE’s Office of Science is working to address some of the most pressing challenges of our time. For more information, visit https://energy.gov/science.