I/O and Data Seminar Series: Efficient Parallel I/O with HDF5 and Proactive Data Containers (PDC)
Title of Presentation: “Efficient Parallel I/O with HDF5 and Proactive Data Containers (PDC)”
Presented by: Suren Byna, Lawrence Berkeley National Laboratory
Speaker’s Abstract:
Science is driven by data and we observe three trends that affect scientific data management: (1) massive concurrency by million-scale CPU cores, (2) large volume of data produced by scientific simulations, observations, and experiments, and (3) deepening memory and storage hierarchy with heterogeneous devices. These trends significantly impact the performance of parallel I/O, which is a critical mechanism for storing and retrieving data on supercomputing systems. Toward handling this impact, we are researching various optimization strategies and novel storage technologies that are being productized in HDF5. This talk will review our R&D in data management techniques used on large-scale supercomputing systems. The topics include: I/O software stack for storing and retrieving data to and from parallel file systems, I/O optimizations applied in several ECP applications, and ongoing research in object storage for supercomputing systems.
Slide Deck: Efficient Parallel I/O with HDF5 and Proactive Data Containers (PDC)
Speaker’s Bio: Suren Byna is a Staff Scientist in the Scientific Data Management (SDM) Group in CRD @ LBNL. His research interests are in scalable scientific data management. More specifically, he works on optimizing parallel I/O and on developing systems for managing scientific data. He is the PI of the ECP funded ExaHDF5 project, and ASCR funded object-centric data management systems (Proactive Data Containers – PDC) and experimental and observational data management (EOD-HDF5) projects.