In This Message

Meetings & Workshops
– AI for Science at Scale – Part 2 (Oct 12)
– Introduction to OpenMP Offload Part 1: Basics of Offload (Sep 29)
– Introduction to OpenMP Offload Part 2: Optimization and Data Management (Oct 6)
– HIP Training Series
Upcoming Downtimes
– Frontier and Orion (Oct 3)
Center Announcements
– Call for posters at the upcoming 2023 OLCF User Meeting
– OLCF Alpine Decommission (Begin moving data now)
– Nominations for OLCF User Group Executive Board
– Call for Proposals for Summit in 2024

Meeting & Workshops

AI for Science at Scale – Part 2 (Oct 12)
Thursday, October 12, 2023

Training large deep learning models, including large language models, is resource-intensive and requires innovative parallelization and distribution strategies. In part 1 of this workshop, we demonstrated how to train a deep learning model in a distributed fashion across multiple GPUs of the Summit supercomputer using data parallelism. Building on this, part 2 will show how to train a model on multiple GPUs across nodes of the Frontier supercomputer. We will demonstrate and focus on model parallelism techniques and frameworks, such as DeepSpeed, FSDP, and Megatron.  Registration is open now and available through:
https://www.olcf.ornl.gov/calendar/ai-training-series-ai-for-science-at-scale-part-2

Introduction to OpenMP Offload Part 1 : Basics of Offload
September 29, 2023
12:00 – 1:45 PM (EDT)
Virtual via Zoom
The OpenMP API is a scalable model that gives parallel programmers a simple and flexible interface for developing portable parallel applications in C/C++ and Fortran. Join us for part 1 of a four-part OpenMP Offload training that will enable application teams and developers to accelerate their code with the use of GPUs, as well as exploiting the latest OpenMP functionality to program multi-core platforms like Frontier and Perlmutter. This session, offered by OLCF and NERSC, is part of the Performance Portability training series. The hands-on sessions will be performed on OLCF Frontier and NERSC Perlmutter. Participants without existing accounts will be provided training accounts on Perlmutter.

Register here:  https://www.olcf.ornl.gov/calendar/introduction-to-openmp-offload-part-1-basics-of-offload-2/

Introduction to OpenMP Offload Part 2: Optimization and Data Management
October 6, 2023
12-2:30 pm EDT
Virtual via Zoom

The OpenMP API is a scalable model that gives parallel programmers a simple and flexible interface for developing portable parallel applications in C/C++ and Fortran. For part 2 of our OpenMP Offload series, OLCF/NERSC staff will cover optimization strategies and show how efficient data movement and a better understanding of the hierarchy of parallelism available can lead to improved performance. NERSC/OLCF staff will also cover best practices for OpenMP Offload.

For details and to register see:

https://www.olcf.ornl.gov/https/wwwolcfornlgov/calendar/introduction-to-openmp-offload-part-2-optimization-and-data-management-2/

HIP Training Series
HIP is a parallel computing platform and programming model from AMD that extends C++ to program GPUs, with an API very similar to CUDA but supporting Nvidia and AMD GPU targets. The HIP training series will introduce GPU programming concepts from basic GPU programming, porting your existing GPU applications to HIP, and profiling and tracing your HIP code, among others. Each session will be 2 hours with a lecture and hands-on.

  • Oct 02, GPU Profiling (Performance Timelines)
  • Oct 16, GPU Profiling (Performance Profile)

Registration is now open for each event.   More information on the series including registration can be found at: https://www.olcf.ornl.gov/hip-training-series/

Upcoming Downtimes

  • Frontier and Orion will be unavailable from 8:00 AM until 5:00 PM on Tuesday, October 3.

Center Announcements

Call for posters at the upcoming 2023 OLCF User Meeting
We are pleased to announce that the Oak Ridge Leadership Computing Facility (OLCF) will host the OLCF User Meeting on October 17-18, 2023.

This year’s event will be hybrid with a poster session on Tuesday, October 17. The scope is to share methods and case studies demonstrating success stories and lessons learned using OLCF systems. This is a great opportunity for select applications to showcase the achievements of their cutting-edge research. The call for posters is open to all onsite visitors and ORNL staff including academics, researchers, and students.

Registration for poster submissions is on the main event page under “Poster Submission Information”.

Posters should be received in PDF form for printing by Wednesday, October 11th, if presenters want ORNL to print their poster and have it ready for the exhibit.
For more information contact the poster session organizers: Antigoni Georgiadou (georgiadoua@ornl.gpv) & Peter Groszkowski (groszkowskip@ornl.gov)

OLCF Alpine Decommission 
(Begin moving data now)

The Alpine filesystem has reached the end of its life and data cannot remain on it after the end of December. Alpine will become read-only on December 19, 2023 to prepare for the disposition of Alpine on January 1, 2024.  To assist you with moving your data off of Alpine, the DTNs mount the new Orion filesystem and all projects with access to Alpine have now been granted access to the Orion filesystem.  We highly encourage all teams to start migrating and/or deleting data from the Alpine filesystem now.  If you wait too late in the year to begin the transition, you will run the risk of running out of time to move your data before the system is decommissioned.  It is important to note that any data remaining on the Alpine filesystem after December 31, 2023 will truly be unavailable and not recoverable in any way as the system will be dismantled and the drives will be shredded.

More details on the Alpine decommission timeline can be found at https://docs.olcf.ornl.gov/systems/2023_olcf_system_changes.html

Nominations for OLCF User Group Executive Board
Nominate yourself to run for the OLCF User Group Executive Board. If elected, you will serve on a 10-person board that provides advice and feedback to the OLCF on the current and future state of OLCF operations and services. The term of service for the successful candidates is three years. Elections will be held electronically during the user meeting. The nomination will take place through September 30. You may nominate yourself at https://www.olcf.ornl.gov/2023-ougeb-nomination.

Call for Proposals for Summit in 2024

The Department of Energy is extending Summit operations through October 2024, enabling researchers to pursue projects on one of the world’s leading AI-enabled open science supercomputing platforms.  OLCF will allocate Summit through new programs for the calendar year 2024.  SummitPLUS is one of the new allocation programs that will be used to allocate a significant portion of the system for 2024. The program is open to researchers from academia, government laboratories, federal agencies, and industry. We welcome proposals for computationally ready projects from investigators who are new to Summit, as well as from previous INCITE, ALCC, DD, ECP awardees and projects. We encourage proposals on emerging paradigms for computational campaigns including data-intensive science and AI/ML.

More information on the SummitPLUS allocation program and Alpine decommission can be found at https://docs.olcf.ornl.gov/systems/2023_olcf_system_changes.html