OLCF Staff Develops Digital Object Identifier Framework to Facilitate Open Access to Datasets

OLCF users can log in to the DOI portal, provide relevant metadata, and upload their datasets. OLCF staff then work with the US Department of Energy’s Office of Scientific and Technical Information to provide a DOI for the dataset.

Scientists using supercomputing resources at the US Department of Energy’s (DOE’s) Oak Ridge National Laboratory (ORNL) often generate large datasets. Although researchers can publish their findings in scientific journals and conference proceedings, their original datasets that led to the publications typically are stored in archival storage systems or within community repositories with little or no means for sharing with the larger community.

To address this, staff members at the Oak Ridge Leadership Computing Facility, a DOE Office of Science User Facility located at ORNL, created the Constellation DOI framework that makes it possible for a researcher to obtain a digital object identifier (DOI) to catalog and publish scientific data artifacts for open access.

OLCF users can now log in to the DOI portal to complete a DOI request by providing relevant metadata and uploading their datasets. The portal also allows users to view other published DOIs from OLCF users and review DOI requests. The DOI workflow includes submission, review, approval, publication, and dissemination of DOI data and metadata.

DOE’s Office of Scientific and Technical Information (OSTI) serves as the registration authority for OLCF DOIs and further disseminates OLCF DOI metadata through DataCite—an international non-profit organization that provides DOIs for datasets. Once a DOI is issued, the dataset is archived and served from OLCF’s High-Performance Storage System (the HPSS archival storage system). The Constellation DOI portal provides interested data consumers with open access for data discovery and download. This project facilitates the emerging trend of requiring federally funded research programs to provide open access to research data.

The service became available in December 2016. Thus far, two OLCF scientists have used the DOI portal to request and acquire DOIs to publish their datasets. Dr. Arvind Ramanathan published a dataset that contains almost 8,000 images generated by bench testing reconstruction for the BigNeuron project (a community effort to simulate a single neuron, an unsolved challenge in brain science). Joseph Kennedy published a benchmark dataset for the Community Ice Sheet Model (CISM), supporting comparison with standard modeling tools such as the Land Ice Verification and Validation toolkit. The DOI service demonstrates a new OLCF capability for supporting data scientists in publicizing their results to foster and support collaboration and scientific discovery.

Raghul Gunasekaran, Tom Barron, Mitch Griffith, and Sudharshan Vazhkudai of OLCF’s Technology Integration Group and Dale Stansberry of the Advanced Data & Workflow Group developed the Constellation DOI portal. Jason Kincl and Ryan Adamson from the HPC Operations Group helped evaluate and deploy the service to OLCF users. Moving forward, the team plans to improve the metadata descriptors for better search and discovery by the wider scientific community. In addition, the team plans to incorporate Globus and GridFTP to enable large file transfers. The Constellation project team is also working on several other data services, such as workflow orchestration and a data portal to facilitate viewing, sharing, and downloading of data.

