In This Message

  • Upcoming Downtimes
    • Center-wide (April 16)
  • Meetings & Workshops ​ ​
    • Performance Portability Training: Kokkos Training 2024 (April 25-26)
    • AI Training Series: Enhancing PyTorch Performance on Frontier with the RCCL/OFI-plugin (Apr 17)
  • OLCF Highlights
    • New Frontiers

Upcoming Downtimes

– Frontier, Orion, Andes, DTNs, FrontierSPI, Jupyter, Marble, and HPSS will be unavailable from 8:00 AM until 8:00 PM on Tuesday, April 16.

Meetings & Workshops ​

AI Training Series: Enhancing PyTorch Performance on Frontier with the RCCL/OFI-plugin
April 17, 1:00 pm – 2:00 pm Eastern Time

Machine learning frameworks running on top of AMD GPUs use a library called RCCL which provides standard collective communication routines for an arbitrary number of GPUs installed across single or multiple nodes. The RCCL/OFI plugin maps RCCLs connection-oriented transport APIs to libfabric’s connection-less reliable interface. This allows RCCL applications to take benefit of libfabric’s transport layer services like reliable message support and operating system bypass. Using this plugin with PyTorch can lead to better performance.

In this seminar, an overview of using PyTorch on Frontier with the aws-ofi-rccl plugin will be provided, along with specific profiling examples run on Frontier. This seminar is intended for OLCF users that have an allocation on Frontier, but all are welcome to join and view the presentation. For more information or to register, see https://www.olcf.ornl.gov/calendar/pytorch-on-frontier/

Performance Portability Training: Kokkos Training 2024
April 25-26, 12:00 p.m. EST

This will be a virtual event.  Users are encouraged to make use of the resources that they currently have access to for this workshop. An AWS allocation will be provided for users who do not have access to resources.

This Kokkos workshop is open to developers of ECP applications and software-technology projects who are already using Kokkos and want to port their code to next generation architectures and/or further optimize their code. Developers working on other applications are welcome as space allows.

This is not a tutorial on the basics of Kokkos programming.  Attendees are expected to already have some level of experience with Kokkos.  The training will focus on performance and will teach attendees how to use Kokkos Tools to profile, tune, and debug code.  The first session will cover new features that were added in the recent past.  The second session will be hands-on with members of the Kokkos team helping application developers with their own codes

Register at: https://www.olcf.ornl.gov/calendar/kokkos-training-2024/

OLCF Highlights

New Frontiers
The OLCF’s new Section Head for Operations, Verónica Melesse Vergara, is profiled by Diversity in Action magazine, focusing on her work with the Frontier supercomputer.

Read: https://mydigitalpublication.com/publication/m=46265&i=816008&p=36&ver=html5