Seminar: Parallel Programming for the Next Decade
Parallel Programming for the Next Decade
Speaker: Michael Wolfe, The Portland Group and NVIDIA
Date and time: 5/13/2-15, 1:30 PM – 3:00 PM Eastern Time
Slides from this seminar can be found here
Abstract:
Large scale high performance supercomputers from different vendors and integrators over the next decade will explore a number of different design points. All will have thousands of nodes connected by a network, where each node has multiple cores, local memory and local storage, but many differences will be apparent. Some will have a relatively small number of fat nodes, with multiple processor sockets and even heterogeneous processors, such as Summit and Sierra. Others will have a very large number of thinner nodes, with a single processor socket, such as Aurora. Other design points will explore heterogeneity on a single socket, such as the AMD APU. Still others may have a single instruction set, but with heterogeneous performance implementations, with a few fat cores and many thin cores on a node. The memory hierarchy will become more interesting, with a high capacity memory and a smaller high bandwidth memory, requiring the system, the runtime or the application to move data to the right memory at the right time.
The goal of OpenACC is to allow a single program to target the broad range of these systems with high performance. We will start by exploring the similarities and differences among the architectures that a programming model like OpenACC needs to expose and exploit. We will look at existing OpenACC features and features in development to see how they can be used for the different architectures. We will also look at where other programming models, like OpenMP, C++ parallel iterators, Fortran do concurrent, fall short and how they should evolve as well.