Skip to main content

100-gigabit-per-second redundancy adds resiliency to high-speed network

ESnet provides services to more than 40 DOE research sites, including the entire national laboratory system, its major scientific instruments, and its supercomputing facilities such as the OLCF. The network permits DOE-funded scientists to productively collaborate with partners around the world.

If data is the lifeblood of a scientific computing center, a resilient network can be the difference between new discoveries and costly service disruptions.

In November 2018, staff at the US Department of Energy’s (DOE’s) Oak Ridge Leadership Computing Facility (OLCF), a DOE Office of Science User Facility located at Oak Ridge National Laboratory, upgraded a key component of the OLCF’s data pipeline, creating a high-performing secondary link between the OLCF and DOE’s Energy Sciences Network (ESnet), the world’s fastest network for science.

The installation of a second 100-gigabit-per-second (Gbps) core router introduces performant redundancy for the OLCF’s global users community and helps maintain year-round network availability. Previously the OLCF network consisted of one router with a single 100 Gbps connection and a backup router with multiple 10 Gbps links.

“This network enhancement provides redundancy for maintenance, outages, and upgrades and will allow us to quickly bring in new technology with minimal to no disruption,” said OLCF high-performance computing (HPC) Linux systems engineer and task lead Daniel Pelfrey. “It’s a very good situation for moving forward.”

Pelfrey teamed with OLCF HPC network administrators Paul Newman and Benton Sparks to complete the core router installation.

One of the primary beneficiaries of 100 Gbps connectivity is OLCF users, who regularly move large datasets to and from the center. These datasets typically range from a few gigabytes to a few terabytes, but the OLCF’s network has the capacity to handle even larger datasets. This capability is especially critical for researchers in data-intensive domains such as cosmology and climate science who possess multipetabyte datasets and shuttle data between multiple HPC centers.

“At full saturation, a 100 Gbps link can move a petabyte of data in about 27 hours,” Pelfrey said.

The core router installation marks the completion of a multiyear process that began in 2015 when the OLCF established its first 100 Gbps link—a significant speedup from the multiple 10 Gbps links that existed previously. The establishment of 100 Gbps connectivity from both core routers ensures that no single point of failure exists between the OLCF and ESnet scientific community, which spans national laboratories, HPC centers, and world-class experimental facilities in the United States and Europe.

The arrangement also positions the laboratory to respond to researchers’ future data needs, which might soon require aggregating multiple 100 Gbps links to work in parallel, Pelfrey said.

“If there is a use case for it, I think that’s where we’re headed,” he said.

ORNL is managed by UT-Battelle LLC for the Department of Energy’s Office of Science, the single largest supporter of basic research in the physical sciences in the United States. DOE’s Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit https://science.energy.gov/.

Jonathan Hines

Jonathan Hines is a science writer for the Oak Ridge Leadership Computing Facility.