Staff to host multiple Summit-related training events in coming months
Now that the Oak Ridge Leadership Computing Facility (OLCF) has launched its IBM AC922 Summit supercomputer, staff members in the OLCF’s User Assistance and Outreach (UAO) Group are planning robust training events intended to enhance user experiences on the new system.
Unlike the OLCF’s current Cray XK7 Titan supercomputer, which has a single CPU and GPU on each node, Summit has two IBM Power9 CPUs and six NVIDIA Tesla V100 GPUs on each node to distribute work more efficiently and decrease time required to run scientific codes. Summit is expected to have a peak capability of 200 petaflops when it is fully accepted this fall.
The OLCF, a US Department of Energy (DOE) Office of Science User Facility located at DOE’s Oak Ridge National Laboratory (ORNL), will host a variety of Summit training events this year—including webinars, hands-on workshops, and screencasts—organized by Tom Papatheodore, a high-performance computing (HPC) programmer and user support specialist at the OLCF. Earlier this month, Papatheodore and several other OLCF staff members led an “Introduction to Summit” webinar that gave the 46 attendees a rundown of the new architecture.
“This webinar was intended to give new users the basic information they need to get up and running on Summit,” Papatheodore said. “We showed attendees how to use the batch scheduler, launch jobs, and utilize the on-node solid-state drives,” which are storage drives that use flash memory to store data.
Papatheodore is also organizing a workshop later this summer called “Targeting Multi-GPU Nodes” to help users of Summit understand how to take advantage of its multi-GPU architecture. NVIDIA staff members Jeff Larkin and Steve Abbott will contribute expertise in GPU computing and offer tips and tricks for navigating the new hardware. The workshop’s date is to be determined.
In the fall, OLCF will host a Summit workshop with the IBM Center of Excellence from October 1 to 5 that will provide a more comprehensive view of the system. Attendees will learn about the system’s architecture, schedulers, job launchers, and programming environments. Presentations will offer information about programming models for Summit, early science accomplishments, and lessons learned on the machine.
The OLCF has incorporated additional workshops into its training schedule to leverage the expertise of multiple groups at ORNL. Summit users may present a wide range of knowledge sets, from the beginner to the GPU expert, but the schedule has something for everyone.
The DOE-led Exascale Computing Project, which seeks to accelerate the delivery of an exascale computing ecosystem, is heading a Kokkos workshop July 24–27 at ORNL. Organized by ORNL computer scientist Graham Lopez, “Performance Portability with Kokkos” will allow users to take advantage of the Kokkos programming model, which enables C++ applications to run on multiple compute platforms. Participants will work closely with Kokkos experts to apply the model to their own applications.
“Kokkos can be attractive for users working with C++ because it lets them express parallel execution and data requirements in a general way that works across various HPC platforms,” Lopez said. “Because a single implementation of the algorithm performs well on different types of compute resources, it allows more potential development resources to be spent on new science rather than on maintaining specialized code for each different architecture.”
Ronny Brendel, a research associate in the OLCF’s Computer Science Research (CSR) Group, will lead a full-day workshop on Score-P and Vampir on August 17. These profiling tools provide information to help developers identify which parts of a code require their attention.
“The importance of using profilers cannot be underestimated,” Papatheodore said. “It’s not always obvious which parts of the code take the longest amount of time to run. Let’s say I decide to speed up a part of my code that’s only 3 percent of the total runtime. That’s not beneficial to me. But if I can take a part that’s using 60 percent of my runtime and cut that in half, I significantly decrease the time-to-solution.”
Nick Forrington, a field applications engineer for Arm in the CSR Group, is leading a full-day Arm-related workshop on September 14. The workshop will include presentations on Arm’s debuggers and profilers to provide users with tools to optimize their own codes.
Between these workshops, Papatheodore will organize mini-hackathons, 1- to 2-day events in the fall that will give users a chance to put their knowledge into action during hands-on experience; attendees will receive codes that they may optimize to the extent of their choosing.
Papatheodore said some workshops may be more useful than others, depending on the person’s level of experience.
“Maybe you have no experience,” Papatheodore said. “Maybe you’re an OLCF staff member but you currently have no experience with GPU programming in particular. Maybe you’ve been running on Titan, and now you need to transition to Summit. These workshops cater to a wide range of skill sets because we want to meet people at their individual levels of knowledge.”
Conference calls and extras
The OLCF continues to hold monthly user conference calls that focus on diverse topics, such as new tools, system updates, and upcoming events. Led by OLCF HPC user support specialist Bill Renaud, the calls bridge the user community and OLCF staff. This year is packed with Summit-related topics, and each call is recorded and posted on the OLCF website.
Papatheodore has also started incorporating screencasts, short how-to videos covering relevant topics, into the OLCF’s current user documentation.
“These are 5-minute videos that users can access if they prefer video over written documentation,” Papatheodore said. “For example, I did one on how to set up a new token. A lot of users have trouble doing this, so I made a quick video that actually walks them through the steps.”
For a list of the OLCF’s online tutorials—including online guides, recordings of past training events, and screencasts—please visit https://www.olcf.ornl.gov/for-users/training/tutorials/.
A comprehensive list of the OLCF’s training events is available at https://www.olcf.ornl.gov/for-users/training/training-calendar/.
ORNL is managed by UT-Battelle for the Department of Energy’s Office of Science, the single largest supporter of basic research in the physical sciences in the United States. DOE’s Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit https://science.energy.gov.