High-Performance Storage System hardware changes lead to a four- to fivefold increase in data retrieval.

Because of recent upgrades, the OLCF’s High-Performance Storage System showed a four- to fivefold speedup in transferring data from short-term memory to tape.

Because of recent upgrades, the OLCF’s High-Performance Storage System showed a four- to fivefold speedup in transferring data from short-term memory to tape.

As supercomputers have increased in size and performance, the amount of data generated has become more difficult to maintain.

Knowing these data demands, staff at the Oak Ridge Leadership Computing Facility (OLCF), a US Department of Energy (DOE) Office of Science User Facility, has invested heavily in making sure users can access data quickly while also taking steps to keep data safe.

When users receive time on the OLCF’s Titan supercomputer—a Cray XK7 capable of 27 petaflops, or 27 quadrillion calculations per second—they typically generate large data sets that take time to analyze. The OLCF, located at DOE’s Oak Ridge National Laboratory, offers users safe data storage on its High-Performance Storage System (HPSS). Rather than keeping their simulation data resident on the Spider II file system, they can archive their data to HPSS for later analysis, reduction, or visualization.

Tapes offer another layer of data reliability and can be held by HPSS compactly and securely. Retrieving data from tapes can take a long time, though, and staff wanted to help users expedite that process.

Jason Hill, of the OLCF staff, explained that the upgrades were user-centric. “Our users let us know that it can take a while to get data out of HPSS,” Hill said. “It’s a resounding theme for us, and we took that to heart when we targeted building out infrastructure both in networking and with our disk cache.”

The storage team invested in 40-gigabit Ethernet connectivity for all the HPSS movers together across two separate switches. The HPSS movers are responsible for reading and writing the data in the disk cache as well as migrating data to the tape media. The Ethernet switches provide a high bandwidth path connecting the disk and tape movers together for moving the data quickly to the archival media. The two switches are now connected by twelve 100-gigabit-per-second links that are aggregated for performance. The investment has already paid off—the team routinely sees a four- to fivefold improvement in time spent migrating data from the disk cache to the long-term tape storage.

In addition to speeding up communication between the different storage tiers, the storage team also wanted to help users avoid retrieving data from tapes for as long as possible. The team purchased 15 petabytes of raw disk cache to help expand the amount of data that could be recalled and accessed quickly.

“Our hope is that users won’t need to get their data from the tape,” Hill said. “We will of course migrate data to tape for users, but with this upgrade, users should be able to recall data from the last 6 months directly from the disk cache.”

In addition, the storage team has helped beef up the reliability of long-term data storage through the Redundant Array of Inexpensive Tapes (RAIT) system.

Oak Ridge National Laboratory is supported by the US Department of Energy’s Office of Science. The single largest supporter of basic research in the physical sciences in the United States, the Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.