Image

AI Training Series: AI for Science at Scale – Part 2

AI for Science at Scale – Part 2
Thursday, October 12, 2023

Training large deep learning models, including large language models, is resource-intensive and requires innovative parallelization and distribution strategies. In the earlier workshop, we demonstrated how to train a deep learning model in a distributed fashion across multiple GPUs of the Summit supercomputer using data parallelism. Building on this, we will show how to train a model on multiple GPUs across nodes of the Frontier supercomputer. We will demonstrate and focus on model parallelism techniques and frameworks, such as DeepSpeed, FSDP, and Megatron.

Series Github page: https://github.com/olcf/ai-training-series

Slides:

Recording:

Agenda:

Time Topic Speaker
1:00 pm – 1:45 pm EDT Scaling, LLMs Sajal Dash (OLCF, Analytics & AI Methods at Scale)
1:45 pm – 2:00 pm EDT Scientific Applications Sajal Dash
2:00 pm – 3:00 pm EDT Hands-on Examples Sajal Dash

 

 

Registration

100 registrations limit reached -- registration is now closed

Joining Information

Joining information will be sent to you in a calendar event before the event.

Date

Oct 12 2023

Time

(Eastern Time)
1:00 pm - 3:00 pm

Location

Webcast
Category
QR Code

Comments are closed.