Project Description

Data Jockey is a workflow aware data management service that helps users automate the orchestration of data movement and placement across multiple storage tiers in a batch-oriented HPC environment like OLCF.

The goal is to offload the complexity of data preparation and lifetime management from our users while having many storage tiers (tape archives, object stores, parallel file system, node local NVRAM and more to come).

Data Jockey achieves this goal as an overarching layer on top of the existing storage infrastructure, exposing an abstract environment for policy driven data movement, placement, and access that unifies data management while hiding the complexity of having an increasing number of storage tiers.

The service is architected as an outsourced and centralized managed control plane service for user specific scientific workflows, that is capable of reaching out and orchestrating heterogeneous user data management resources such as data stores, movers, and interfaces in behalf of the users.