To analyze and process scientific data, researchers often employ Jupyter notebooks, interactive web documents that host snippets of code written in statistical programming languages such as Python (or R or Julia). Domain scientists, mathematicians, data scientists, and educators use the Jupyter tool suite to analyze and visualize data, iteratively prototype codes, document and share procedures, and even display presentations.
Now the Oak Ridge Leadership Computing Facility (OLCF) has deployed JupyterLab environments for its users, giving them a push-button avenue to easily leverage Jupyter software on Slate, a resource the OLCF launched in February that provides container orchestration services to allow users to run specialized tools that support computational campaigns. With Jupyter on Slate, users can develop their codes in sequence and more smoothly execute them on the IBM AC922 Summit, the nation’s fastest supercomputer, housed at the OLCF.
“With a standard terminal session on a high-performance system like Summit, you don’t have a convenient graphical user interface—all you see is text,” said Spencer Ward, high-performance computing (HPC) DevOps engineer in the Platforms Group under the National Center for Computational Sciences’ (NCCS’) Operations Section. “Jupyter is a visual experience. It allows you to have these beautiful, interactive visualizations and understand the output of your simulation. Jupyter tools allow scientists to fine-tooth comb information to understand their data and determine what needs to be further run on HPC resources.”
JupyterLab on Slate allows a user to have multiple notebooks, file browsers, and spreadsheets in the same environment with a highly configurable, modern interface.
Previously, OLCF users had to navigate a complex and tedious process to take advantage of Jupyter tools. This undertaking required setting up individual servers for each session and building secure connections from Summit’s nodes to the user’s personal computer to successfully run the notebooks. In early 2020, a team led by Ward and Suhas Somnath, computer scientist in the OLCF’s Data Lifecycle and Workflows Group, began implementing a JupyterHub, a multi-user environment for Jupyter notebooks, to solve this problem. The result is an offering of notebooks to OLCF users for lightweight data analysis and postprocessing capabilities on Slate.
“You don’t want to learn how to drive on the interstate,” Ward said. “Jupyter notebooks allow people to drive and get where they need to go, and as soon as they’re comfortable, they can scale that up to Summit.”
Jupyter can also interface with multiple OLCF resources, such as the center’s data analysis cluster Andes and data structures like Alpine, the center’s 250 petabyte IBM Spectrum Scale file system. Now, even users and collaborators who don’t need to access Summit can seamlessly interact with data sets on Alpine via Jupyter notebooks. The team also added the ability to run notebooks created in the Conda package management system on the OLCF’s resources via a JupyterLab environment, giving users the ability to develop and run their codes in a fail-safe sandbox. The OLCF is a US Department of Energy (DOE) Office of Science User Facility located at DOE’s Oak Ridge National Laboratory (ORNL).
Some users are also taking full advantage of a new capability in Slate—its newly implemented GPUs, installed over the summer by senior staff member Oscar Hernandez of ORNL’s Computer Science and Mathematics Division and Ryan Prout, HPC data analytics engineer in NCCS’ Operations. ORNL scientists Jens Glaser and Swen Boehm have been using tools such as NVIDIA’s RAPIDS GPU data science ecosystem and the Dask library for parallel computing in Python to take advantage of the new GPUs.
“We wanted GPUs in Slate so that users could work with the same tools on Slate and Summit,” Somnath said. The installation was necessary for Glaser and Boehm, who are running a COVID-19 drug discovery project on Summit and require both RAPIDS and the Dask library for their codes.
“The overwhelming majority of Summit’s computational power comes from its GPUs,” Somnath said. “Our deployment of Jupyter lowers the barrier for teams who might want to use Summit for the complete scientific investigation, starting from preprocessing and then followed by traditional simulations, data analytics on the simulation results, and even making data and algorithms available to external collaborators via Jupyter notebooks.”
Users can take advantage of the computational resources within Slate to perform exploratory analysis or code prototyping without using up their allocation time on Summit. Similar to Glaser and Boehm, users can also use their allocation on Summit to orchestrate computations on Summit’s compute nodes via Dask and RAPIDS. Access to a JupyterHub also gives users the ability to collaborate and share resources in the project.
Additionally, the Platforms Group—led by Group Leader Jason Kincl—is working with Somnath and Benjamín Hernandez of the Artificial Intelligence Analytics Scalable Methods Group in NCCS’ Advanced Technologies Section to identify which tools and capabilities might be added to the current environment.
“We already have the tool sets integrated that need to be part of the main package, but we are looking at incorporating more tools as we get input from the community,” Somnath said.
To access the OLCF user documentation for Jupyter, visit https://docs.olcf.ornl.gov/services_and_applications/jupyter/overview.html.
UT-Battelle LLC manages Oak Ridge National Laboratory for DOE’s Office of Science, the single largest supporter of basic research in the physical sciences in the United States. DOE’s Office of Science is working to address some of the most pressing challenges of our time. For more information, visit https://energy.gov/science.