ORNL Researchers Collaborate to Study Application I/O Behavior
Team makes progress toward a supercomputing traffic controller
Supercomputers, like busy intersections, need traffic cops.
A system like Oak Ridge National Laboratory’s (ORNL’s) Titan, with more than 16,000 nodes, can have dozens of applications running at the same time, each trying to move data between the supercomputer and its storage system. Without help these competing reading/writing (I/O) demands lead to an inevitable traffic jam.
The Oak Ridge Leadership Computing’s (OLCF’s) Sudharshan Vazhkudai, working with colleagues at ORNL, North Carolina State University (NCSU) and Qatar Computing Research Institute (QCRI), has taken a step toward solving this storage bottleneck.
“This bottleneck creates a resource contention, which leads to decreased productivity and variations in application runtimes and utilization,” he explained.
In response, the team created IOSI, a suite of statistical tools that can separate out individual application I/O “signatures” or patterns from the noise created by other applications and maintenance operations on the storage server. IOSI filters out the noise, enabling the application-specific signal to be extracted and studied.
Vazhkudai and teammates Raghul Gunasekaran of the OLCF, Yang Liu of NCSU, and Xiaosong Ma of NCSU and QCRI published their findings in a paper titled “Automatic Identification of Application I/O Signatures from Noisy Server-Side Traces” in the Proceedings of the 12th USENIX Conference on File and Storage Technologies. The conference, referred to as FAST, is a top venue for sharing research results in file systems and attracts leading researchers from industry, national laboratories, and academia.
The team was the first to study storage-server-side logs for signature extraction in the high-performance computing (HPC) domain. Using IOSI the team demonstrated that it is possible to identify individual application I/O signals despite the noise created by other actions on the storage server. As supercomputers grow in both size and speed, distinguishing individual application I/O signatures is an important step to one day creating a “traffic controller” to smooth the traffic snarls within HPC systems. —Dixie Daniels