Using Darshan to Profile I/O on Frontier
Overview
Optimizing I/O performance is an essential, yet often overlooked, aspect of High-Performance Computing (HPC). As AI datasets grow and GPU-accelerated training demands higher throughput, I/O can become a silent bottleneck. Understanding the interaction between application code and the Lustre parallel filesystem is critical for scaling applications effectively on Frontier.
This training provides a practical guide to profiling code I/O using Darshan. We will demonstrate how to use command-line utilities to analyze reports, leverage Python to export and visualize I/O statistics, and identify common “bad” I/O behavior that limits performance. While the darshan-runtime module is loaded by default on Frontier, this session assumes no prior experience and will walk through the entire workflow of turning raw data into actionable performance insights.
Registration
This event has already happened. Please review the training materials below.
