Skip to main content

AI Training Series: AI for Science at Scale – Part 3

AI for Science at Scale – Part 3
Thursday, July 11, 2024

Training large deep learning models, including large language models, is resource-intensive and requires innovative parallelization and distribution strategies. In earlier workshops, we demonstrated on Frontier how to train a deep learning model in a distributed fashion across multiple GPUs at a “small” and “intermediate” scale. For the final part of this training series, we scale up further and demonstrate how to fine-tune pre-trained networks at a larger scale on Frontier. Registered Frontier users will be able to utilize a system reservation to participate in the hands-on portion of the event.

Although this training is intended for current Frontier users, all are welcome to register and view the presentation. Additionally, no prior knowledge of Part 1 or 2 is necessary — you are encouraged to register even if you did not attend previous iterations of this series.

Presenter: Dr. Sajal Dash, OLCF – Analytics & AI Methods at Scale

Series Github page: https://github.com/olcf/ai-training-series

Slides | Recording

 

Time Topic Speaker
1:00 pm – 1:20 pm ET Introduction to distributed training of LLMs Sajal Dash
1:20 pm – 1:50 pm ET Finding the best training strategies for large models Sajal Dash
1:50 pm – 2:20 pm ET Fine-tuning a pre-trained model Sajal Dash
2:20 pm – 3:00 pm ET Hands-on demo using Frontier Sajal Dash

 

Date

Jul 11 2024
Expired!

Time

1:00 pm - 3:00 pm

Location

Webcast
Category

Organizer

Michael Sandoval
Email
[email protected]
QR Code