HPSS users must soon bid farewell to their old personal directories for a project-based hierarchy
The High-Performance Storage System (HPSS) at the Oak Ridge Leadership Computing Facility (OLCF) will soon transition to a project-based directory structure. This may sound like a small change, but for research teams storing their project data on HPSS, the update promises to simplify file management and streamline searching for (or sharing) the right data.
Under the current system, users can save data to both personal and project directories—but that habit has often created confusion over where specific project data can be found by other team members. For example, if a user is assigned to multiple projects and saves all of this different data into a personal directory, other team members may have difficulty identifying which data belongs to what project. So, starting in January 2020, all data storage will be restricted to project areas.
“By only allowing writing to project directories, we eliminate confusion as to what file belongs to which project,” said Mitchell Griffith, an archival storage software developer for HPSS at the OLCF, a US Department of Energy (DOE) Office of Science User Facility at DOE’s Oak Ridge National Laboratory. “If a user writes a file to project C’s directory, then that project owns the file.”
The new hierarchy is based on the project directory layout used by the OLCF’s Spider storage system, making the two systems more compatible and eventually allowing project teams with sensitive, proprietary, or export-controlled data to be able to use HPSS to archive their data. Today, those project teams do not have access to HPSS, partly due to the current layout structure.
Each project will have three writable directories:
/hpss/prod/PROJECTID/proj-shared: for sharing files between project members
/hpss/prod/PROJECTID/users/$USER: for data needed mainly by that one user
/hpss/prod/PROJECTID/world-shared: for sharing open data between projects
This project-based structure also helps avoid potential disorganization when users leave a project as they move to other institutions, graduate, or start their own research projects.
“This is to prevent ambiguity in file ownership. Projects last longer than people, and we want to ensure some basic metadata about what is written to HPSS,” Griffith said. “By doing this, we can associate a project’s description with the file and have a better understanding of what is written to HPSS.”
Users with data currently stored in their personal directories (at /home/$USER) will be encouraged to start transferring it to the correct project areas; Griffith said the HPSS team expects the data migration to be fully completed in 12 to 18 months.
To easily transfer their data, Griffith advises users to employ the “mv” (move) command instead of the “cp” (copy) command. The mv command is a quick metadata operation, whereas cp will copy data in HPSS, which can be a lengthy process.
Once a user’s home directory is empty, it will become an unwritable managed directory and links will be automatically created to projects accessible by the user.
Meanwhile, the HPSS team will rename the current /proj/PROJECTID directories to /hpss/prod/PROJECTID/proj-shared, then add a link to that area.
“There will be an HPSS downtime when this is done, but this means technically the old project data is in a different location,” Griffith said. “However, the link should allow the users to access the data like they have always accessed the data. We are doing metadata operations for the restructure instead of copying data.”
Users who experience issues or need assistance should contact firstname.lastname@example.org.
UT-Battelle LLC manages Oak Ridge National Laboratory for DOE’s Office of Science, the single largest supporter of basic research in the physical sciences in the United States. DOE’s Office of Science is working to address some of the most pressing challenges of our time. For more information, visit https://energy.gov/science.