AI Training Series: AI for Science at Scale – Part 3
AI for Science at Scale – Part 3
Thursday, July 11, 2024
Training large deep learning models, including large language models, is resource-intensive and requires innovative parallelization and distribution strategies. In earlier workshops, we demonstrated on Frontier how to train a deep learning model in a distributed fashion across multiple GPUs at a “small” and “intermediate” scale. For the final part of this training series, we scale up further and demonstrate how to fine-tune pre-trained networks at a larger scale on Frontier. Registered Frontier users will be able to utilize a system reservation to participate in the hands-on portion of the event.
Although this training is intended for current Frontier users, all are welcome to register and view the presentation. Additionally, no prior knowledge of Part 1 or 2 is necessary — you are encouraged to register even if you did not attend previous iterations of this series.
Presenter: Dr. Sajal Dash, OLCF – Analytics & AI Methods at Scale
Series Github page: https://github.com/olcf/ai-training-series
Time | Topic | Speaker |
---|---|---|
1:00 pm – 1:20 pm ET | Introduction to distributed training of LLMs | Sajal Dash |
1:20 pm – 1:50 pm ET | Finding the best training strategies for large models | Sajal Dash |
1:50 pm – 2:20 pm ET | Fine-tuning a pre-trained model | Sajal Dash |
2:20 pm – 3:00 pm ET | Hands-on demo using Frontier | Sajal Dash |
