IFS NR Data Hackathon
Exploring the ECMWF global 1-km IFS experimental nature run
A baseline for a digital twin of earth
Proposal Submission Form
IFS NR Data Hackathon
The European Center for Medium-Range Weather Forecasts (ECMWF) and the Oak Ridge National Laboratory (ORNL) are pleased to announce access to the data collection from global 1-km experimental nature run simulations using the Integrated Forecast System (IFS) with explicit convection . We invite you to join us in exploring this precursor to a digital twin of the earth!
This open science event will facilitate direct access to the data, currently hosted at OLCF. Participating teams will also have access to a large linux cluster with 700 nodes. In addition, teams exploring machine learning applications will also have limited access to Ascent, a stand-alone 15-node IBM AC922 with an architecture identical to Summit with 90 NVIDIA V100 GPUs.
The Experimental Nature Run
A nature run can be considered a “long forecast” of the atmosphere (typically two weeks beyond initialization)  when the simulation develops its own course of reality. The experimental nature run at global 1-km resolution (XNR1K) is based on IFS Science Version 45r1 . Note that hereafter, use of the term “nature run” in this announcement refers to XNR1K.
A set of two seasonal simulations have been completed, one corresponding to the northern hemispheric winter months (NDJF) and the other for the North Atlantic tropical cyclone season (ASO). For the first seasonal run (NDJF), the hydrostatic IFS model was initialized at 00Z on 1 November 2018, and the other season (ASO) was initialized at 00Z on 1 August 2019. The XNR1K simulations were forced only by 1/12 degree OSTIA sea surface temperatures (SST) at the lower boundary. The output was saved every 3 hours at all model levels. In addition, the XNR1K simulations were rerun for four specific extreme events, with output every 15 minutes. The NR special cases include a tropical cyclone and three severe storm events over the continental USA.
Both model level data at all 137 levels and pressure level fields at 31 levels are being made available, along with 94 single level fields. The 3D variables include the prognostic variables which are temperature, pressure, moisture and winds (u, v and w), as well as hydrometeor content, namely cloud liquid, cloud ice, rain and snow.
Topics for Exploration
At 1-km resolution, many of the smaller scale processes are resolved in the NR, such as most of the gravity wave spectrum . Hence, this XNR1K dataset provides a unique opportunity to not only investigate smaller scale processes but also to inform the development of their parameterization in order to improve numerical weather and climate prediction. We invite science teams to propose research and application areas of interest that could be explored at scale. Investigation of extreme weather events, including hurricanes and severe storms, are of special interest. Additional emphasis will also be on the development of AI and ML applications for surrogate models, emulators for data assimilation systems, satellite retrievals of earth observations, etc. Visualization of the global 1-km simulations are also encouraged for science investigations as well as for outreach activities to inform and educate. The high temporal frequency (15 minutes) simulations were designed around case studies to develop future observing system technologies using Observing System Simulation Experiments (OSSEs) .
The selected science teams will have access to the data and main computational resources for a period of 6 months with an anticipated end date of 31 December 2022. This project has been given a total allocation of 25,000 node-hours on OLCF’s Andes analysis and visualization cluster, 4 PB of storage on high performance file system (GPFS), and access to a Jupyter Hub for interactive analysis with Jupyter Notebooks. For AI and machine learning production needs, teams will also have periodic access to Ascent, a training system with an architecture identical to Summit. A team of mentors, with expertise in systems, data science, computing and machine learning, will be available to provide guidance. A data curator will also offer help with the publication of any derived results and data sets.
We expect to support 5 – 10 teams of at least two members each with access to the OLCF resources for a period of six months. Applications will be accepted through 31 August, and periodically reviewed. We will select and approve suitable projects on a rolling basis through 30 September. Note that the anticipated date of completion for all the projects is still 31 December 2022. Meritorious projects needing extensions will be evaluated on an individual basis.
All applications should provide the necessary information and be submitted online. Initially, only two users per team may be granted access to OLCF resources. Additional users may be approved to meet the science objectives. But all science teams are encouraged to involve as many members as necessary for the success of the projects. International participants are welcome to apply, but approval and access are subject to the laws of the government of the United States.
We will facilitate a Slack channel and a series of monthly meetings to exchange information, resolve issues and discuss progress. In the spirit of open science, participants are strongly encourage to foster new collaborations and share new discoveries and lessons learned during the course of the event.
Data are available for open science research and applications only. Redistribution of the data is currently not allowed. List all project team members who will be using the data.
Teams will be asked to use the following statement to acknowledge the data and compute resources made available as part of this hackathon.
Statement of Acknowledgement
This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725.
ECMWF also benefited from collaborations funded via ESCAPE-2 (No. 800897), MAESTRO (No. 801101), EuroEXA (No. 754337), and ESiWACE-2 (No. 823988) projects funded by the European Union’s Horizon 2020 future and emerging technologies and the research and innovation programmes.
|Valentine Anantharaj (ORNL)||Lead (Data)|
|Inna Polichtchouk (ECMWF)||PI & Lead (Science)|
|Tom Papatheodore (ORNL)||Lead (Event)|
|Samuel Hatfield (ECMWF)||Lead (Computing)|
|Suzanne Parete-Koon (ORNL)||Event Manager|