The National Center for Computational Sciences (NCCS), which houses the Oak Ridge Leadership Computing Facility (OLCF), has created the High-Performance Computing Core Operations (HPC Core Ops) Group to oversee the center’s networking, cybersecurity, and infrastructure.
The group was formed in response to workload changes following acquisition of the 200-petaflop IBM AC922 Summit supercomputer, located at the OLCF. This month, OLCF users began running allocation projects on Summit under the Innovative and Novel Computational Impact on Theory and Experiment, or INCITE, program. The OLCF is a US Department of Energy (DOE) Office of Science User Facility at DOE’s Oak Ridge National Laboratory.
“As a supercomputing center grows, there comes a point where there are too many services and too many things going on at the same time in one group,” said Ryan Adamson, HPC cybersecurity engineer and interim HPC Core Ops group leader. “This was a strategic change that allows us to scale successfully and work more efficiently.”
HPC Core Ops staff members were reassigned from the OLCF’s High Performance Compute and Data Operations (HPC and Data Ops) Group, formerly known as HPC Ops. The split of HPC and Data Ops into two separate and distinct groups marks a noteworthy change in NCCS organizational structure and will afford both groups opportunities to focus on a specific subset of the center’s operations.
HPC Core Ops houses three teams: a networking team, which handles the Ethernet network for all of NCCS’ systems; a cybersecurity team, which monitors and secures the supercomputing center; and a core infrastructure team, which provides necessary external services to the center’s HPC resources. HPC and Data Ops, on the other hand, focuses on the user-facing aspects of operating the center, which include administering the HPC and cluster resources and monitoring the storage and file systems.
“The capabilities provided by HPC Core Ops impact every NCCS customer, whereas HPC and Data Ops capabilities are specifically tailored to various customer needs,” Adamson said.
The new structure will allow Kevin Thach, the HPC and Data Ops group leader, to focus his group’s attention on the OLCF’s supercomputing and storage resources and on individual projects such as the Collaboration of Oak Ridge, Argonne, and Livermore (CORAL)—a joint HPC procurement activity among these three national laboratories.
Both groups will continue to work with the OLCF’s User Assistance and Outreach Group to solve user issues and work on user tickets. They will also continue to share many of the same procedures such as change management, code review, and disaster recovery.
“We must continue to work closely with HPC and Data Ops,” Adamson said. “We will not be successful in this evolving supercomputing environment unless we do.”
ORNL is managed by UT-Battelle LLC for the Department of Energy’s Office of Science, the single largest supporter of basic research in the physical sciences in the United States. DOE’s Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit https://science.energy.gov.