OLCF representatives attend each year to share experiences with Cray user community

Reuben Budiardja received runner-up for best paper at the 2017 Cray User Group meeting.

While researchers are busy answering big science questions on the Cray XK7 Titan supercomputer, Oak Ridge Leadership Computing Facility (OLCF) staff members are keeping the nation’s most powerful supercomputer, now in its fifth year of operation, running smoothly.

Staff work closely with the 27-petaflop Cray system day in and day out, and with 2 years left of Titan operation, as well as the recent deployment of new test beds built by Cray, Inc., including ARM1, OLCF is staying active in the Cray community.

This May, five staff members from the National Center for Computational Sciences (NCCS)—where the OLCF is located at the US Department of Energy’s Oak Ridge National Laboratory (ORNL)—shared their experiences and technical accomplishments with other international users of Cray supercomputers at the Cray User Group (CUG) meeting in Redmond, Washington. As part of the proceedings, OLCF computational scientist Reuben Budiardja received runner-up for the CUG Conference Best Paper Award for “Application-Level Regression Testing Framework Using Jenkins.”

Budiardja and coauthors Timothy Bouvet and Galen Arnold from the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign developed a solution for regression testing on large, complex systems like the Blue Waters Cray XE6 at NCSA and Titan at the OLCF, a DOE Office of Science User Facility. They ran their testing framework on Blue Waters.

Using Jenkins, an automation server that is most commonly used as an integration tool for software development, the team configured the server for application-level regression testing.

“These high-performance computing systems have many components—the software stack, the file systems, the applications,” Budiardja said. “We can easily test the individual components, but we want to know from a user experience perspective: how is it all working together?”

Budiardja said the system administrator does not always know when something is wrong because he or she may have privileges that prevent access to issues encountered by the user, who is working on the system through a scientific application.

To replicate a user experience and evaluate system usability and performance, the team ran several common scientific applications and community codes with well-known performance characteristics through their regression testing framework. Examples include molecular dynamics applications such as NAMD and LAMMPS. The paper provides use cases and best practices for deploying Jenkins as a regression testing framework for supercomputing based on the team’s experience on Blue Waters.

As treasurer of the CUG Board of Directors, NCCS’s Jim Rogers helped organize the 2017 meeting. “We participate in multiple technical tracts, submitting papers on system administration, user assistance and outreach, applications, and machine learning,” said Rogers, NCCS director for computing and facilities.

For one such technical tract, OLCF High-Performance Computing System Administrator Matt Ezell attended the CUG special interest group XTreme, which maintains a list of issues and features important to staff working on Cray’s high-performance systems. Input from XTreme and other CUG groups can help guide product development for Cray, which also sends team members to the user group meeting. OLCF staff members Chris Fuson and Veronica Vergara Larrea, coauthors along with staffer Wayne Joubert on the winning best paper at CUG in 2015, also attended the conference this year.

Oak Ridge National Laboratory is supported by the US Department of Energy’s Office of Science. The single largest supporter of basic research in the physical sciences in the United States, the Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.