Spring 2025 AI Focused Frontier Hackathon
Unleash the Power of AI and HPC for your Science!
Frontier AI Hackathon
April 28, May 1,5 and 8.
Virtual via Zoom and Slack
Proposal Submission deadline: March 14th.
Event Overview
OLCF, in conjunction with AMD, HPE, and others, is seeking teams of researchers who want to make the most of Frontier using AI to aid their simulation campaigns. This hackathon is your chance to:
- Advance Traditional AI Applications for Research: Focus on improving the performance, scalability, or methods of AI applications. Examples include training and optimizing machine learning models, refining data preprocessing and postprocessing pipelines, and applying AI to analyze diverse research data—whether from simulations, experiments, or observational studies.
- Explore AI Surrogates for HPC Applications: Develop and implement AI surrogate models designed to approximate, accelerate, or replace computationally intensive processes in HPC workflows. These approaches offer transformative opportunities to enhance the performance and efficiency of large-scale scientific computing. Examples of how AI and HPC can be effectively combined—including AI-in-HPC, AI-about-HPC, AI-out-HPC, and HPC-for-AI—are provided in the Examples Tab. Visit this section for detailed descriptions and inspiration for your project ideas.
Join us to push the boundaries of AI + HPC and collaborate with world-class experts! Participation in this event is competitive, with up to 10 teams selected by March 14, 2024. Teams must submit a proposal form to apply. See the Proposals Tab to apply.
See the Hackathon Timeline Tab for specific event dates and related training opportunities and the Examples Tab for inspirational examples of how AI and HPC are used together.
What is a Frontier AI Hackathon?
The Frontier AI Hackathon is a multi-day collaborative event designed to advance artificial intelligence in scientific research and high-performance computing (HPC). See the Hackathon Timeline Tab for specific event dates and related training opportunities. Teams will work on porting, optimizing, or enhancing their AI applications to run on Frontier, focusing on traditional AI workflows (e.g., model training, data processing) or exploring innovative AI surrogates for computationally intensive HPC components. Each team (3+ developers) will collaborate with expert mentors from OLCF, AMD, HPE, and other partners, who will provide technical guidance and share best practices. This hackathon offers a unique opportunity to:
- Experiment with cutting-edge AI techniques.
- Optimize AI applications for Frontier.
- Collaborate with leading experts to accelerate your research.
- Set time aside to advance your application’s development and strengthen collaboration within your research team.
Proposal Submission
Participation in this event is competitive, with up to 10 teams selected by March 14, 2024. Teams must submit a proposal form to apply. The proposal form will ask you to:
- Describe your code and its current or potential AI components.
- Outline your hackathon goals.
For example goals and to submit your proposal, please visit the Proposal Submission Tab below.
Who May Apply
- Existing Frontier Users: Researchers already using Frontier who want to explore AI + HPC or improve their AI codes. You will use your existing allocations on Frontier.
- Future Users: Researchers who aspire to develop or contribute to a future INCITE or ALCC Project on Frontier have two resource access options:
- Open-Source Applications: Future User Teams with open-source codes can request access to OLCF Open Training resources, including Odo (a Frontier-like cluster) and a large parallel filesystem. These accounts are typically processed within a few days, and selected teams will receive instructions on how to apply. Note: An open-source code is one that can be publicly accessed, viewed, or shared without requiring additional permissions.
- Non-Open-Source Applications: Future User Teams with non-open codes can apply for a Director’s Discretionary account on Frontier. These accounts may take up to a month for approval, so early application is encouraged. Approval is contingent on the Director of Science’s authorization and the successful acceptance of your Hackathon proposal.
Supporting Events and Information
To support your hackathon participation, we are offering AI training sessions early in the year to help you understand how AI can enhance or replace algorithms in HPC codes. Please see the Hackathon Timeline Tab for a list of the events.
Bonus Points Opportunity: Teams that attend at least one of the AI training events in January or February—or schedule and attend an OLCF office hours session with Hackathon mentors—will earn one bonus point in their Hackathon proposal review score. Ensure at least one team member participates in these opportunities before the Call for Proposals closes on March 14 to maximize your chances of success. To sign up for OLCF Office hours, follow the instructions given at this link: OLCF office hours.
Proposal Submission
Admission to this hackathon is competitive and requires a brief application. The application form will ask you to:
- Describe your HPC/AI application code(s).
- Set specific, achievable goals for your team during the hackathon.
When submitting your goals, please include:
- Clear examples from your application or workflow.
- Links to publicly accessible codes or datasets to support your goals (if available).
Selection Criteria:
Our aim is to select teams that can best utilize the hackathon experience and the capabilities of Frontier. Teams will be selected based on:
- How clear, specific, and reasonable their goals are for a 4-day hackathon.
- How effectively they describe their code and its relevance to Frontier.
Important Note:
Avoid proposing goals that require large-scale use of Frontier, as hackathon reservations will be limited to ~40 nodes across all teams. This limitation ensures ongoing production science campaigns are not impacted.
Example Goals
Below are a few examples to inspire your proposal. Feel free to propose other goals related to AI or HPC that align with your research priorities on Frontier.
Example Goals for Teams with AI Applications:
- Optimize an Existing AI Code for Frontier Nodes: Adapt and fine-tune your PyTorch application for efficient performance on Frontier’s architecture.
- Enable an AI Application to Run on Multiple Nodes: Scale your application to leverage multiple nodes on Frontier.
- Implement Helpful Tools or Libraries in Your AI Code: Integrate or customize libraries to enhance the functionality or usability of your AI code.
- Tackle a Convergence Issue: Resolve challenges in your application’s training or computational workflow.
Example Goals for Teams Exploring AI Surrogates in HPC Codes:
- Explore Relationships in Data: Investigate relationships within your data that could support modeling with an AI surrogate.
- Explore Forward or Inverse Problems: Determine whether forward or inverse problems in your code are candidates for AI surrogate replacement.
- Ongoing Implementation of AI Surrogates: Receive support for ongoing efforts to integrate AI surrogates into your HPC code.
- Parameterization Replacement with AI Surrogates: Replace parameterizations in your code with AI surrogates trained on relevant data.
- Data Processing for AI Training: Prepare and process data from your code to make it suitable for training an AI surrogate.
Traditional HPC Applications
If your application does not use AI but you have a timely need for support with porting or improving its performance on Frontier (e.g., scaling or optimization), you may still apply. However, preference will be given to codes that align with existing mentor capacity in AI + HPC.
Resources and Preparation
AI Training Series:
Our AI Series in the first part of the year will help you prepare a strong proposal for this hackathon. Be sure to check the timeline for dates of training events.
For more information, visit the Frontier Hackathons & AI Series Page (coming soon).
Propose Here
Hackathon Timeline
Details about the Hackathon Timeline and specific events will be displayed here. Click on the title above to open or close this section.
Event | Date | Description |
---|---|---|
Call for Proposals Open | Now | Submit your proposals above under the "Proposal Submission" Form. |
AI Surrogates as Adjuncts to Traditional HPC Simulations | Thursday, January 30th, 2025; TBD | Come learn how AI surrogates can work alongside your traditional HPC algorithms. |
AI Training Series 2: AI and HPC Call To Action: AI Surrogates for HPC | Wednesday, February 27th, 2025; TBD | Join us for an interactive discussion of where AI surrogates could replace or augment traditional HPC Algorithms in your specific codes. |
CFP Closes | March 14th at midnight. | Make sure your proposal is submitted on time. |
Notification of Selected Teams | March 22nd | |
AI Training Series 3: AI and HPC Call To Action: AI on Frontier | Thursday, March 27th, 2025; TBD | Learn best practices and tips for doing AI on Frontier and other OLCF resources. |
What to expect & Team Intros Session | Friday, April 11th 2025, 1:00 - 3:00 pm EDT | We tell you what to expect and each team presents 3-5 slides about their goals and needs. |
Hackathon Day 1 | Monday, April 28th, 2025, 11 a.m. - 5 p.m. | 11:00 a.m. - 5:00 p.m. | Hacking |
Hackathon Day 2 | Thursday, May 1st, 2025, 11 a.m. - 5 p.m. | 11:00 a.m. - 5:00 p.m. | Hacking |
Hackathon Day 3 | Monday, May 5th, 2025, 11 a.m. - 5 p.m. | 11:00 a.m. - 5:00 p.m. | Hacking |
Hackathon Day 4 | Thursday, May 8th, 2025, 11 a.m. - 5 p.m. | 11:00 a.m. - 3:15 p.m. | Hacking 3:15 p.m. - 3:30 p.m. | Survey 3:30 p.m. - 5:00 p.m. | Team "Outros" |
Team Registration
If you are a member of a selected Hackathon Team, please complete this form by 5 p.m. on April 25th.
Examples
Types of AI + HPC
AI and HPC can be combined in several ways, each offering unique opportunities for innovation and efficiency. Below are four primary paradigms:
- AI-in-HPC:
- AI replaces a component of the HPC simulation or the entire simulation itself.
- Example: Using an AI model to replace a subgrid model in a simulation or employing autoencoders and principal component analysis (PCA) to reduce the number of variables in a chemical kinetic simulation.
- AI-About-HPC:
- AI systems run concurrently with HPC tasks, analyzing output in real-time to provide insights or augment computational processes.
- Example: Deep learning models for protein structure prediction on HPC systems analyze simulation outputs to accelerate bioengineering research.
- AI-Out-HPC:
- AI systems operate externally to the traditional HPC simulation loop but dynamically manage workflows.
- Example: Reinforcement learning to optimize computational campaigns and improve the progression of HPC tasks.
- HPC-for-AI:
- HPC resources accelerate AI processes, such as training machine learning models or processing large datasets.
- Example: Scaling deep neural networks across Frontier’s architecture to improve training performance.
These paradigms are derived from Brewer et al., “AI-coupled HPC Workflow Applications, Middleware, and Performance.” (Link to paper)
Examples with Abstracts from Scientific Literature
- Using AI to Reduce Variables in a Chemical Kinetic Network and Quantify Uncertainty
- Reference: An Out-of-Distribution-Aware Autoencoder Model for Reduced Chemical Kinetics, Pei Zhang et al., ORNL
- Abstract : While detailed chemical kinetic models have been successful in representing rates of chemical reactions in continuum scale computational fluid dynamics (CFD) simulations, applying the models in simulations for engineering device conditions is computationally prohibitive. To reduce the cost, data-driven methods, e.g., autoencoders, have been used to construct reduced chemical kinetic models for CFD simulations. Despite their success, data-driven methods rely heavily on training data sets and can be unreliable when used in out-of-distribution (OOD) regions (i.e., when extrapolating outside of the training set). In this paper, we present an enhanced autoencoder model for combustion chemical kinetics with uncertainty quantification to enable the detection of model usage in OOD regions, and thereby creating an OOD-aware autoencoder model that contributes to more robust CFD simulations of reacting flows. We first demonstrate the effectiveness of the method in OOD detection in two well-known datasets, MNIST and Fashion-MNIST, in comparison with the deep ensemble method, and then present the OOD-aware autoencoder for reduced chemistry model in syngas combustion
- CFDNet:A Deep Learning-Based Accelerator for Fluid Simulations
- Reference: CFDNet: A Deep Learning-Based Accelerator for Fluid Simulations, Octavi Obiols-Sales et al.
- Abstract : CFD is widely used in physical system design and optimization, where it is used to predict engineering quantities of interest, such as the lift on a plane wing or the drag on a motor vehicle. However, many systems of interest are prohibitively expensive for design optimization, due to the expense of evaluating CFD simulations. To render the computation tractable, reduced-order or surrogate models are used to accelerate simulations while respecting the convergence constraints provided by the higher-fidelity solution. This paper introduces CFDNet — a physical simulation and deep learning coupled framework, for accelerating the convergence of Reynolds Averaged Navier-Stokes simulations. CFDNet is designed to predict the primary physical properties of the fluid including velocity, pressure, and eddy viscosity using a single convolutional neural network at its core. We evaluate CFDNet on a variety of use-cases, both extrapolative and interpolative, where test geometries are observed/not-observed during training. Our results show that CFDNet meets the convergence constraints of the domain-specific physics solver while outperforming it by 1.9 – 7.4x on both steady laminar and turbulent flows. Moreover, we demonstrate the generalization capacity of CFDNet by testing its prediction on new geometries unseen during training. In this case, the approach meets the CFD convergence criterion while still providing significant speedups over traditional domain-only models.
- Machine learning models, replace the computationally intensive ab initio calculations in molecular dynamics by training on data generated from other highly accurate ab initio calculations, achieving similar accuracy with much greater efficiency.
- Reference:Pushing the Limit of Molecular Dynamics with Ab Initio Accuracy, Weile Jia et al., SC20
- Abstract: For 35 years, ab initio molecular dynamics (AIMD) has been the method of choice for modeling complex atomistic phenomena from first principles. However, most AIMD applications are limited by computational cost to systems with thousands of atoms at most. We report that a machine learning- based simulation protocol (Deep Potential Molecular Dynamics), while retaining ab initio accuracy, can simulate more than 1 nanosecond-long trajectory of over 100 million atoms per day, using a highly optimized code (GPU DeePMD-kit) on the Summit supercomputer. Our code can efficiently scale up to the entire Summit supercomputer, attaining 91 PFLOPS in double precision (45.5% of the peak) and 162/275 PFLOPS in mixed-single/half precision. The great accomplishment of this work is that it opens the door to simulating unprecedented size and time scales with ab initio accuracy. It also poses new challenges to the next-generation supercomputer for a better integration of machine learning and physical modeling.
- RAIN: Reinforcement Learning for Climate Modeling Tasks
- Reference: RAIN Project GitHub Repository
- Summary: This study tested multiple AI algorithms on climate modeling tasks, demonstrating AI’s ability to improve accuracy and reduce biases.
- The study tested multiple AI algorithms in two types of climate modeling tasks.
- Off-policy algorithms – meaning those that explore new strategies to make decisions –e.g., DDPG, TD3, TQC) worked better in Simple Climate Bias Correction because exploration is crucial to find ways to correct biases.
- On-policy algorithms- meaning those that exploit knowledge of what already is know to work well to aid decisions (e.g., PPO, TRPO) performed better in Radiative-Convective Equilibrium because exploitation (focusing on refining known good strategies) was more important for this stable and complex task.
- The results showed that AI can significantly reduce biases (up to 90%), proving its potential for improving climate model accuracy.
- Physics-Informed Neural Networks (PINNs) were used to solve classes of Partial Differential equations
- Reference: Physics Informed Deep Learning (Part I): Data-driven Solutions of Nonlinear Partial Differential Equations, Maziar Raissi et al., arXiv:1711.10561
- Abstract: We introduce physics informed neural networks — neural networks that are trained to solve supervised learning tasks while respecting any given law of physics described by general nonlinear partial differential equations. In this two part treatise, we present our developments in the context of solving two main classes of problems: data-driven solution and data-driven discovery of partial differential equations. Depending on the nature and arrangement of the available data, we devise two distinct classes of algorithms, namely continuous time and discrete time models. The resulting neural networks form a new class of data-efficient universal function approximators that naturally encode any underlying physical laws as prior information. In this first part, we demonstrate how these networks can be used to infer solutions to partial differential equations, and obtain physics-informed surrogate models that are fully differentiable with respect to all input coordinates and free parameters.
- This work follows the development of a simulation workflow framework, Merlin, whose purpose is to facilitate the creation of large-scale ensembles of HPC simulation data suitable for analysis by machine learning tools.
- Reference:Merlin: Enabling Machine Learning-Ready HPC Ensembles, J Luc Peterson et al.
- Abstract:With the growing complexity of computational and experimental facilities, many scientific researchers are turning to machine learning (ML) techniques to analyze large scale ensemble data. With complexities such as multi-component workflows, heterogeneous machine architectures, parallel file systems, and batch scheduling, care must be taken to facilitate this analysis in a high performance computing (HPC) environment. In this paper, we present Merlin, a workflow framework to enable large ML-friendly ensembles of scientific HPC simulations. By augmenting traditional HPC with distributed compute technologies, Merlin aims to lower the barrier for scientific subject matter experts to incorporate ML into their analysis. In addition to its design, we describe some example applications that Merlin has enabled on leadership-class HPC resources, such as the ML-augmented optimization of nuclear fusion experiments and the calibration of infectious disease models to study the progression of and possible mitigation strategies for COVID-19.
General AI + HPC References:
- Brewer et al., AI-coupled HPC Workflow Applications, Middleware, and Performance. (arXiv link)
- Jha, S. et al., AI-coupled HPC Workflows. Artificial Intelligence for Science, 2023. (arXiv link)
- Jha, S. & Fox, G., Understanding ML-Driven HPC: Applications and Infrastructure. eScience, 2019. (arXiv link)
- Wang, H. et al., Scientific Discovery in the Age of Artificial Intelligence. Nature, 2023.
- Raissi, M., Perdikaris, P., & Karniadakis, G.E., Physics-Informed Neural Networks for Solving PDEs. Journal of Computational Physics, 2019.