OLCF Explores Deep Learning with DGX-1

The Oak Ridge Leadership Computing Facility (OLCF) recently deployed a new NVIDIA DGX‑1 artificial intelligence supercomputer era offer scientists and researchers opportunities to delve into deep learning technologies with more vigor than ever before. Deep learning uses neural networks to classify data or predict outcomes by training models on large data sets and by abstracting high-level features or patterns from lower level data. The OLCF is a DOE Office of Science User Facility located at ORNL.

Scientists and researchers at the US Department of Energy’s (DOE’s) Oak Ridge National Laboratory (ORNL) are using deep learning because of its potential to leverage big data analytics to automate and accelerate the scientific discovery process. By running on NVIDIA’s DGX-1 deep learning appliance, researchers have a chance to develop and apply deep learning methods to important science questions.

Since the DGX-1 arrived at the OLCF late last fall, staff members in the Advanced Data and Workflow (ADW) Group—a team that helps establish data-driven computing environments for the OLCF and its collaborators—have been working to set up big data analytics projects on the system. Deployed alongside ORNL’s Compute and Data Environment for Science (CADES), an infrastructure that offers compute and data resources for data-intensive science, the DGX-1 can be accessed by users to explore scientific workflows.

Mallikarjun (Arjun) Shankar, group leader for ADW and the CADES director, said that his team is working closely with researchers to look at how much impact DGX-1 can have on their deep learning workflow. This is helping them determine the best system approaches for data-analytics projects moving forward.

“Right now, we have early users,” Shankar said. “We want to see what projects ideally could run on the DGX-1 before we open it up to a broader user base, and these big data projects can show us when and how it is most effective.”

Currently, several ORNL teams are using the DGX-1, including one focused on applying deep learning to improve cancer treatment strategies. The project, a collaboration that includes ORNL’s Health Data Science Institute and the National Cancer Institute, aims to train a family of neural networks to automatically extract information from anonymized cancer surveillance reports. Another project within the collaboration called the CANcer Distributed Learning Environment (CANDLE) is using the DGX-1 to analyze large-scale molecular dynamics simulations as part of its goal to understand certain biological interactions and build predictive models for patient drug responses based on the cancer-related data.

“We need to find out exactly how we must input our data and if there are certain formats that would be more effective than others,” said ORNL researcher Arvind Ramanathan, a technical lead of the CANDLE project. “We need to find the best ways to make use of the maximum throughput of the system, and we are working with NVIDIA to find the best ways to do that.”

Deemed a small but well-targeted supercomputer, the DGX-1 features eight of the latest NVIDIA Tesla GPUs. Built on NVLink technology, DGX-1 passes data from one GPU to another an order of magnitude faster than traditional data interconnects. DGX-1 employs a technology called containers, which encapsulate applications, allowing users to access their choice of deep learning applications and supporting wider experimentation across multiple models in parallel. The DGX-1 software also includes access to deep learning frameworks that NVIDIA has optimized for maximum GPU-accelerated performance. With simplified workflow through its software tools, DGX-1 minimizes the operational overhead required to run it, saving time and money.

“The DGX-1 provides a cost-effective opportunity to explore advanced deep learning workflows in a standardized environment,” said Jamison Daniel, visualization data scientist at the OLCF. “This approach emphasizes research efficiency. The obstacle of configuring operating systems and building software libraries is removed so that our researchers may remain focused on the scientific challenges at hand.”

Oak Ridge National Laboratory is supported by the US Department of Energy’s Office of Science. The single largest supporter of basic research in the physical sciences in the United States, the Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.