Protecting Big Data

Relocating and updating the HPSS archive

The Oak Ridge Leadership Computing Facility’s (OLCF’s) HPC Operations storage team recently relocated the center’s High-Performance Storage System (HPSS) archive tape library to a centralized location with a more controlled environment, resulting in better overall availability and uptime for OLCF system users and better resiliency of the media.

The OLCF's High-Performance Storage System (HPSS).

Developed in 1997 and a winner of an R&D 100 Award that same year, the HPSS archive uses tape and disk storage components, servers, and HPSS software to provide long-term storage for the massive amounts of data created by users on OLCF systems.

To ensure against data loss, the team updates the archive as often as possible with the latest software and storage technologies. This, however, can be a daunting task. Not only do storage needs increase every year, but the rate of increase is accelerating.

For instance, in 2006 the amount of data stored in HPSS surpassed 1 petabyte for the first time. Reaching this number took 8 1/2 years. To reach the second petabyte, however, took under 2 years, and getting to the third took only 6 months.

This year the team streamlined the day-to-day operations of the HPSS archive system by colocating six Oracle StorageTek SL8500 tape libraries and more than 40,000 media cartridges in a single centralized location.

Each tape library can hold 10,000 individual media cartridges, with each cartridge capable of storing from 1 to 8 terabytes of data. This sheer volume of information made the move from two locations to a central one extremely challenging because more than 30 petabytes are stored within the HPSS archive—roughly three times the size of the entire printed collection at the Library of Congress. Adding to this challenge, the team had to deal with the previous, very complex cabling plant and large array of fiber-channel and Ethernet switch gear.

After several months of preparation, though, the team was able to not only move the library itself, but also upgrade facilities and systems such as power, space, and cooling, as well as complete a new cabling plant and fiber/Ethernet network. The team also worked with Oak Ridge National Laboratory fire engineers and vendor representatives to design a fire-suppression system to meet fire code requirements and further protect the archive media and data from damage.

As a result of this work, the tape library infrastructure is better able to share the load of the HPSS archive’s requests for tape resources. The libraries are also in a more controlled environment that regulates temperature, humidity, and air quality, leading to better resiliency of the tape media.

Lastly, the upgraded cabling plant and infrastructure moved the HPSS archive off older, more-expensive-to-maintain hardware, which will save tens of thousands of dollars each year through reduced maintenance expenses.

“We are always trying to implement the newest storage technologies so that our researchers know their data is safe,” says HPC Operations’ Kevin Thach. “But sometimes the technology is just not there yet or is too expensive at its current stage, so we have to come up with our own ways to make the HPSS archive more reliable.” —by Austin Koenig