OLCF Testing New Platform for Scientific Workflows
OpenShift gives OLCF users ability to deploy and manage infrastructure services
Scientific progress increasingly is driven by data—along with the instruments that produce it, the networks that move it, the systems that analyze it, and the servers that host and disseminate it.
To meet the demands of data-intensive science, researchers and computing experts have created new infrastructures that link scientific instruments with web and data services and high-performance computing (HPC). These workflows have been implemented to automate on-demand, long-running HPC processes and accelerate discovery in fields as varied as high-energy physics, biology, and materials science, among other domains.
In the past, integrating these data workflows with supercomputing resources at the Oak Ridge Leadership Computing Facility (OLCF), a US Department of Energy (DOE) Office of Science User Facility at DOE’s Oak Ridge National Laboratory, has been a time- and labor-intensive process, requiring a thorough understanding of both the workflow and the underlying HPC infrastructure. A new pilot project led by OLCF staff, however, holds the promise of simplifying this task, allowing users to deploy and manage nontraditional workloads on their own.
OLCF HPC Operations staff is testing a platform created by software company Red Hat called OpenShift, an open-source program built on top of the container management system Kubernetes. Within the OLCF environment, OpenShift provides users with a way to execute scientific workflows via containers—customized software bundles that can be run in isolation from a computer’s operating system. Containers give users the freedom to create their own application environment, while OpenShift handles system administration details so that users won’t have to do so.
“It’s kind of like a FedEx model in that you just give FedEx a box and they’ll ship it wherever it needs to go,” said Jason Kincl, a member of the OLCF’s HPC Operations team. “In a similar way, the user can send OpenShift a container, and the software knows how to schedule and run that package.”
This setup could reduce the time users need to stand up workflows from a few weeks to a few days. Additionally, the service could expedite the setup and maintenance of data portals for users who want to host their data at the OLCF and make it accessible to outside collaborators.
Currently, OLCF staff is working with a high-energy physics team to pilot the first scientific workflow using OpenShift. The workflow, known as Big PanDA, has been scheduling jobs opportunistically on the OLCF’s Titan supercomputer since September 2015 in support of the ATLAS collaboration, an ambitious project designed to detect particles created by proton collisions at the Large Hadron Collider in Switzerland. Running Big PanDA on OpenShift is helping OLCF staff fine-tune the new operational model and explore ways for users to access central resources, such as the OLCF’s parallel file system and job scheduler, with containers.
“It’s working, and it’s actually been really cool to see users compress that innovation loop,” Kincl said. “They have the power to get their application up and running on their own.”
Kincl said he would like to see the OLCF eventually run its operational infrastructure, including web and database services, on OpenShift to help reduce the complexity of standing up and managing dozens of OLCF services.
“It’s a heavy lift to run an application today because when our team stands up a machine, its identity and configuration exist in a bunch of different places,” he said. “Having a common platform to build on will provide operational speedup.”
Oak Ridge National Laboratory is supported by the US Department of Energy’s Office of Science. The single largest supporter of basic research in the physical sciences in the United States, the Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.