I/O and Data Seminar Series: Quantifying sources of I/O performance variation through holistic analysis
Title of Presentation: “Quantifying sources of I/O performance variation through holistic analysis”
Presented by: Glenn Lockwood, Lawrence Berkeley National Laboratory
At present, I/O performance analysis requires a multitude of tools to characterize each part of the I/O subsystem, and institutional I/O expertise is required to connect these data into an integrated view and explain why a job performed the way it did. This process is labor-intensive and not sustainable as the storage hierarchy gets deeper and more complex. To address this growing challenge, we have developed the Total Knowledge of I/O (TOKIO) framework to combine insights from each level of the storage stack and provide a holistic view of performance.
This talk will introduce TOKIO’s modular architecture and describe how it can identify several common system-level performance anomalies. We then present the results of applying this classification process to performance measurements collected over the course of an entire year at both NERSC and ALCF. Several forms of resource contention arise as the most common sources of transient I/O performance variability, but other causes, both policy- and technology-specific, are also implicated in performance variation with statistically significant frequency.
We will then conclude with an overview of how TOKIO is implemented at NERSC, several operationally pragmatic tools included with the framework, and avenues for future work.
Speaker’s Bio: Glenn K. Lockwood is a member of NERSC’s Advanced Technologies Group specializing in storage and I/O. In addition to developing TOKIO, he is the I/O performance lead for the NERSC-9 system and is a maintainer of the IOR and mdtest community benchmarks. Glenn holds degrees in ceramic engineering and materials science from Rutgers University and was a systems engineer in the genomics industry prior to joining Lawrence Berkeley National Laboratory.