Weekly Update: September 13, 2023
In This Message
Meetings & Workshops
– October Frontier Hackathon
– AI for Science at Scale – Part 2 (Oct 12)
– Introduction to OpenMP Offload Part 1: Basics of Offload (Sep 29)
Upcoming Downtimes
– Frontier, Orion, HPSS (Sep 19)
– Marble (Sep 26 – 28)
– MyOLCF (Sep 26)
Center Announcements
– 2023 OLCF User Meeting (Registration Deadline, Friday Sep 15)
– Call for posters at the upcoming 2023 OLCF User Meeting
– OLCF Alpine Decommission (Begin moving data now)
– Nominations for OLCF User Group Executive Board
Meeting & Workshops
October Frontier Hackathon
Applications Due September 19, 2023.
The OLCF is pleased to announce that we will be working with HPE, AMD, and ECP to hold a (virtual) Frontier Hackathon on October 30 – November 3, 2023. The Deadline to apply is September 19. Application/software teams with existing Frontier allocations are invited to submit proposals to participate in the hackathon. Once the proposal deadline is reached, we will review all proposals and select 10 teams to attend. During the event, the teams will work toward their development goals (porting, debugging, optimization, etc.) with the help of OLCF, HPE, AMD, and ECP staff.
For more details and to apply see: https://www.olcf.ornl.gov/calendar/frontier-hackathon-october-2023/
AI for Science at Scale – Part 2 (Oct 12)
Thursday, October 12, 2023
Training large deep learning models, including large language models, is resource-intensive and requires innovative parallelization and distribution strategies. In part 1 of this workshop, we demonstrated how to train a deep learning model in a distributed fashion across multiple GPUs of the Summit supercomputer using data parallelism. Building on this, part 2 will show how to train a model on multiple GPUs across nodes of the Frontier supercomputer. We will demonstrate and focus on model parallelism techniques and frameworks, such as DeepSpeed, FSDP, and Megatron. Registration is open now and available through:
https://www.olcf.ornl.gov/calendar/ai-training-series-ai-for-science-at-scale-part-2
Introduction to OpenMP Offload Part 1 : Basics of Offload
September 29, 2023
12:00 – 1:45 PM (EDT)
Virtual via Zoom
The OpenMP API is a scalable model that gives parallel programmers a simple and flexible interface for developing portable parallel applications in C/C++ and Fortran. Join us for part 1 of a four-part OpenMP Offload training that will enable application teams and developers to accelerate their code with the use of GPUs, as well as exploiting the latest OpenMP functionality to program multi-core platforms like Frontier and Perlmutter. This session, offered by OLCF and NERSC, is part of the Performance Portability training series. The hands-on sessions will be performed on OLCF Frontier and NERSC Perlmutter. Participants without existing accounts will be provided training accounts on Perlmutter.
Register here: https://www.olcf.ornl.gov/calendar/introduction-to-openmp-offload-part-1-basics-of-offload-2/
Upcoming Downtimes
- Frontier, Orion, and HPSS will be unavailable from 8:00 AM until 8:00 PM on Tuesday, September 19.
- The standard patching procedure to OpenShift 4.10.66 starts at 08:00 AM on September 26 through 08:00 AM on September 28. During the patching, the Marble cluster should remain available. Jupyter notebooks are stateful and single instance workloads which will be deleted/recreated as part of the node upgrade process.
- MyOLCF will be unavailable from 10:00 AM until 2:00 PM on Tuesday, September 26
Center Announcements
2023 OLCF User Meeting (Oct 17-18)
Please note the last day to register for on-site and virtual User Meeting attendance is this Friday, September 15.
The OLCF invites you to participate in the 2023 OLCF User Meeting at Oak Ridge National Lab in Oak Ridge, TN. The purpose of the annual user meeting is to share selected computational science and engineering achievements emerging from OLCF’s user programs, to enable direct interactions among users, advance OLCF’s relationships with our user community, and provide Facility updates. The meeting will be available to virtual participants via Zoom, but elements of the meeting like the poster session and Facility tours will only be available to the onsite attendees. All attendees (whether onsite or virtual) will need to register. For more information or to register, please visit https://www.olcf.ornl.gov/calendar/2023-olcf-user-meeting/.
Call for posters at the upcoming 2023 OLCF User Meeting
We are pleased to announce that the Oak Ridge Leadership Computing Facility (OLCF) will host the OLCF User Meeting on October 17-18, 2023.
This year’s event will be hybrid with a poster session on Tuesday, October 17. The scope is to share methods and case studies demonstrating success stories and lessons learned using OLCF systems. This is a great opportunity for select applications to showcase the achievements of their cutting-edge research. The call for posters is open to all onsite visitors and ORNL staff including academics, researchers, and students.
Registration for poster submissions is on the main event page under “Poster Submission Information”.
Posters should be received in PDF form for printing by Wednesday, October 11th, if presenters want ORNL to print their poster and have it ready for the exhibit.
For more information contact the poster session organizers: Antigoni Georgiadou (georgiadoua@ornl.gpv) & Peter Groszkowski (groszkowskip@ornl.gov)
OLCF Alpine Decommission
(Begin moving data now)
The Alpine filesystem has reached the end of its life and data cannot remain on it after the end of December. Alpine will become read-only on December 19, 2023 to prepare for the disposition of Alpine on January 1, 2024. To assist you with moving your data off of Alpine, the DTNs mount the new Orion filesystem and all projects with access to Alpine have now been granted access to the Orion filesystem. We highly encourage all teams to start migrating and/or deleting data from the Alpine filesystem now. If you wait too late in the year to begin the transition, you will run the risk of running out of time to move your data before the system is decommissioned. It is important to note that any data remaining on the Alpine filesystem after December 31, 2023 will truly be unavailable and not recoverable in any way as the system will be dismantled and the drives will be shredded.
More details on the Alpine decommission timeline can be found at https://docs.olcf.ornl.gov/systems/2023_olcf_system_changes.html
Nominations for OLCF User Group Executive Board
Nominate yourself to run for the OLCF User Group Executive Board. If elected, you will serve on a 10-person board that provides advice and feedback to the OLCF on the current and future state of OLCF operations and services. The term of service for the successful candidates is three years. Elections will be held electronically during the user meeting. The nomination will take place through September 30. You may nominate yourself at https://www.olcf.ornl.gov/2023-ougeb-nomination.