titan

Up since 11/8/17 02:45 pm

eos

Up since 11/14/17 11:20 pm

rhea

Up since 10/17/17 05:40 pm

hpss

Up since 11/20/17 09:15 am

atlas1

Up since 11/15/17 07:25 am

atlas2

Up since 11/27/17 10:45 am
OLCF User Assistance Center

The center's normal support hours are 9 a.m. until 5 p.m. (Eastern time) Monday through Friday, exclusive of holidays. Outside of normal business hours, calls are directed to the ORNL Computer Operations staff. If you require immediate assistance outside of normal business hours, you may contact them at the phone number listed above. If your request is not urgent, you may send an email to help@nccs.gov, where it will be answered by a NCCS User Assistance member the next business day.

Data Management User Guide

This Guide is your one stop shop for all things data management. It contains a table of contents to help you find specific reference information with one click. However, the extended context for an issue may be contained in two or more articles. The entire text of the guide is given below the table so the reader can see each topic in its full context.

Four Things to Know About Data Management at OLCF

1. Data Management Policy

All OLCF users must read the Data Management Policy to get an account, to understand the best practices for protecting their data and to understand what will happen to their data at the end of their project. To read the data management policy click here.

2. A Storage Area for Every Activity

Your project will have several areas available for data storage and processing. Some are purged, some are not. To begin exploring your options click here.

3. Fast Data Transfer

OLCF offers several methods for data transfer and provides dedicated data transfer nodes. The best method for your data depends on where it is going and what resources are on the receiving end of the data transfer. Each transfer method has an entry in this guide with examples. To begin exploring your options click here.

Please note that the Open Science Grid Certificates required to use GridFTP may take a week to process before you can begin transferring data with them.

4. Better Workflows with Remote Job Submission

Enhance your project’s data workflow by using the remote job submission feature, which allows you to submit batch scripts from one OLCF machine to another. For example, you could submit a script from Titan that actives a transfers to the hpss, and/or a data analysis application on Rhea when the job on Titan has finished running.
For details and examples please click here.

Contents

1. Data Management Policy

(Back to Top)

Note: This details an official policy of the OLCF, and must be agreed to by the following persons as a condition of access to or use of OLCF computational resources:
  • Principal Investigators (Non-Profit)
  • Principal Investigators (Industry)
  • All Users
Title: Data Management Policy Version: 14.01
Introduction
The OLCF provides a comprehensive suite of hardware and software resources for the creation, manipulation, and retention of scientific data. This document comprises guidelines for acceptable use of those resources. It is an official policy of the OLCF, and as such, must be agreed to by relevant parties as a condition of access to and use of OLCF computational resources.
Data Storage Resources
The OLCF provides an array of data storage platforms, each designed with a particular purpose in mind. Storage areas are broadly divided into two categories: those intended for user data and those intended for project data. Within each of the two categories, we provide different sub-areas, each with an intended purpose:
Purpose Storage Area Path
Long-term data for routine access that is unrelated to a project User Home $HOME
Long-term data for archival access that is unrelated to a project User Archive /home/$USER
Long-term project data for routine access that's shared with other project members Project Home /ccs/proj/[projid]
Short-term project data for fast, batch-job access that you don't want to share Member Work $MEMBERWORK/[projid]
Short-term project data for fast, batch-job access that's shared with other project members Project Work $PROJWORK/[projid]
Short-term project data for fast, batch-job access that's shared with those outside your project World Work $WORLDWORK/[projid]
Long-term project data for archival access that's shared with other project members Project Archive /proj/[projid]
User Home
Home directories for each user are NFS-mounted on all OLCF systems and are intended to store long-term, frequently-accessed user data. User Home areas are backed up on a daily basis. This file system does not generally provide the input/output (I/O) performance required by most compute jobs, and is not available to compute jobs on most systems. See the section “Data Retention, Purge, & Quota Summary” for more details on applicable quotas, backups, purge, and retention timeframes.
User Archive
The High Performance Storage System (HPSS) is the tape-archive storage system at the OLCF and is the storage technology that supports the User Archive areas. HPSS is intended for data that do not require day-to-day access. See the section “Data Retention, Purge, & Quota Summary” for more details on applicable quotas, backups, purge, and retention timeframes.
Project Home
Project Home directories are NFS-mounted on selected OLCF systems and are intended to store long-term, frequently-accessed data that is needed by all collaborating members of a project. Project Home areas are backed up on a daily basis. This file system does not generally provide the input/output (I/O) performance required by most compute jobs, and is not available to compute jobs on most systems. See the section “Data Retention, Purge, & Quota Summary” for more details on applicable quotas, backups, purge, and retention timeframes.
Member Work
Project members get an individual Member Work directory for each associated project; these reside in the center-wide, high-capacity Lustre® file system on large, fast disk areas intended for global (parallel) access to temporary/scratch storage. Member Work directories are provided commonly across all systems. Because of the scratch nature of the file system, it is not backed up and files are automatically purged on a regular basis. Files should not be retained in this file system for long, but rather should be migrated to Project Home or Project Archive space as soon as the files are not actively being used. If a file system associated with your Member Work directory is nearing capacity, the OLCF may contact you to request that you reduce the size of your Member Work directory. See the section “Data Retention, Purge, & Quota Summary” for more details on applicable quotas, backups, purge, and retention timeframes.
Project Work
Individual Project Work directories reside in the center-wide, high-capacity Lustre file system on large, fast disk areas intended for global (parallel) access to temporary/scratch storage. Project Work directories are provided commonly across most systems. Because of the scratch nature of the file system, it is not backed up. If a file system associated with Project Work storage is nearing capacity, the OLCF may contact the PI of the project to request that he or she reduce the size of the Project Work directory. See the section “Data Retention, Purge, & Quota Summary” for more details on applicable quotas, backups, purge, and retention timeframes.
World Work
Each project has a World Work directory that resides in the center-wide, high-capacity Lustre file system on large, fast disk areas intended for global (parallel) access to temporary/scratch storage. World Work directories are provided commonly across most systems. Because of the scratch nature of the file system, it is not backed up. If a file system associated with World Work storage is nearing capacity, the OLCF may contact the PI of the project to request that he or she reduce the size of the World Work directory. See the section “Data Retention, Purge, & Quota Summary” for more details on applicable quotas, backups, purge, and retention timeframes.
Project Archive
The High Performance Storage System (HPSS) is the tape-archive storage system at the OLCF and is the storage technology that supports the User Archive areas. HPSS is intended for data that do not require day-to-day access. Project Archive areas are shared between all users of the project. Users should not store data unrelated to OLCF projects on HPSS. Project members should also periodically review files and remove unneeded ones. See the section “Data Retention, Purge, & Quota Summary” for more details on applicable quotas, backups, purge, and retention timeframes.
Local Scratch Storage
A large, fast disk area intended for parallel access to temporary storage in the form of scratch directories may be provided on a limited number of systems. This area is local to a specific system. This directory is, for example, intended to hold output generated by a user's job. Because of the scratch nature of the file system, it is not backed up and files are automatically purged on a regular basis. Files should not be retained in this file system and should be migrated to archival storage as soon as the files are not actively being used. Quotas may be instituted on a machine-by-machine basis if deemed necessary.
Data Retention, Purge, & Quotas
Summary
The following table details quota, backup, purge, and retention information for each user-centric and project-centric storage area available at the OLCF.
User-Centric Storage Areas
Area Path Type Permissions Quota Backups Purged Retention
User Home $HOME NFS User-controlled 10 GB Yes No 90 days
User Archive /home/$USER HPSS User-controlled 2 TB [1] No No 90 days
Project-Centric Storage Areas
Area Path Type Permissions Quota Backups Purged Retention
Project Home /ccs/proj/[projid] NFS 770 50 GB Yes No 90 days
Member Work $MEMBERWORK/[projid] Lustre® 700 [2] 10 TB No 14 days     [4]
Project Work $PROJWORK/[projid] Lustre® 770 100 TB No 90 days     [4]
World Work $WORLDWORK/[projid] Lustre® 775 10 TB No 90 days     [4]
Project Archive /proj/[projid] HPSS 770 100 TB [3] No No 90 days
Area The general name of storage area.
Path The path (symlink) to the storage area's directory.
Type The underlying software technology supporting the storage area.
Permissions UNIX Permissions enforced on the storage area's top-level directory.
Quota The limits placed on total number of bytes and/or files in the storage area.
Backups States if the data is automatically duplicated for disaster recovery purposes.
Purged Period of time, post-file-access, after which a file will be marked as eligible for permanent deletion.
Retention Period of time, post-account-deactivation or post-project-end, after which data will be marked as eligible for permanent deletion.
Important! Files within "Work" directories (i.e., Member Work, Project Work, World Work) are not backed up and are purged on a regular basis according to the timeframes listed above.

[1] In addition, there is a quota/limit of 2,000 files on this directory.

[2] Permissions on Member Work directories can be controlled to an extent by project members. By default, only the project member has any accesses, but accesses can be granted to other project members by setting group permissions accordingly on the Member Work directory. The parent directory of the Member Work directory prevents accesses by "UNIX-others" and cannot be changed (security measures).

[3] In addition, there is a quota/limit of 100,000 files on this directory.

[4] Retention is not applicable as files will follow purge cycle.

Data Retention Overview
By default, there is no lifetime retention for any data on OLCF resources. The OLCF specifies a limited post-deactivation timeframe during which user and project data will be retained. When the retention timeframe expires, the OLCF retains the right to delete data. If you have data retention needs outside of the default policy, please notify the OLCF.
User Data Retention
The user data retention policy exists to reclaim storage space after a user account is deactivated, e.g., after the user’s involvement on all OLCF projects concludes. By default, the OLCF will retain data in user-centric storage areas only for a designated amount of time after the user’s account is deactivated. During this time, a user can request a temporary user account extension for data access. See the section “Data Retention, Purge, & Quota Summary” for details on retention timeframes for each user-centric storage area.
Project Data Retention
The project data retention policy exists to reclaim storage space after a project ends. By default, the OLCF will retain data in project-centric storage areas only for a designated amount of time after the project end date. During this time, a project member can request a temporary user account extension for data access. See the section “Data Retention, Purge, & Quota Summary” for details on purge and retention timeframes for each project-centric storage area.
Sensitive Project Data Retention
For sensitive projects only, all data related to the project must be purged from all OLCF computing resources within 30 days of the project’s end or termination date.
Data Purges
Data purge mechanisms are enabled on some OLCF file system directories in order to maintain sufficient disk space availability for job execution. Files in these scratch areas are automatically purged on a regular purge timeframe. If a file system with an active purge policy is nearing capacity, the OLCF may contact you to request that you reduce the size of a directory within that file system, even if the purge timeframe has not been exceeded. See the section “Data Retention, Purge, & Quota Summary” for details on purge timeframes for each storage area, if applicable.
Storage Space Quotas
Each user-centric and project-centric storage area has an associated quota, which could be a hard (systematically-enforceable) quota or a soft (policy-enforceable) quota. Storage usage will be monitored continually. When a user or project exceeds a soft quota for a storage area, the user or project PI will be contacted and will be asked if at all possible to purge data from the offending area. See the section “Data Retention, Purge, & Quota Summary” for details on quotas for each storage area.
Data Prohibitions & Safeguards
Prohibited Data
The OLCF computer systems are operated as research systems and only contain data related to scientific research and do not contain personally identifiable information (data that falls under the Privacy Act of 1974 5U.S.C. 552a). Use of OLCF resources to store, manipulate, or remotely access any national security information is strictly prohibited. This includes, but is not limited to: classified information, unclassified controlled nuclear information (UCNI), naval nuclear propulsion information (NNPI), the design or development of nuclear, biological, or chemical weapons or any weapons of mass destruction. Authors/generators/owners of information are responsible for its correct categorization as sensitive or non-sensitive. Owners of sensitive information are responsible for its secure handling, transmission, processing, storage, and disposal on OLCF systems. Principal investigators, users, or project delegates that use OLCF resources, or are responsible for overseeing projects that use OLCF resources, are strictly responsible for knowing whether their project generates any of these prohibited data types or information that falls under Export Control. For questions, contact help@olcf.ornl.gov.
Unauthorized Data Modification
Users are prohibited from taking unauthorized actions to intentionally modify or delete information or programs.
Data Confidentiality, Integrity, & Availability
The OLCF systems provide protections to maintain the confidentiality, integrity, and availability of user data. Measures include: the availability of file permissions, archival systems with access control lists, and parity/CRC checks on data paths/files. It is the user’s responsibility to set access controls appropriately for data. In the event of system failure or malicious actions, the OLCF makes no guarantee against loss of data nor makes a guarantee that a user’s data could not be potentially accessed, changed, or deleted by another individual. It is the user’s responsibility to insure the appropriate level of backup and integrity checks on critical data and programs.
Administrator Access to Data
OLCF resources are federal computer systems, and as such, users should have no explicit or implicit expectation of privacy. OLCF employees and authorized vendor personnel with “root” privileges have access to all data on OLCF systems. Such employees can also login to OLCF systems as other users. As a general rule, OLCF employees will not discuss your data with any unauthorized entities nor grant access to data files to any person other than the UNIX “owner” of the data file, except in the following situations:
  • When the owner of the data requests a change of ownership for any reason, e.g., the owner is leaving the project and grants the PI ownership of the data.
  • In situations of suspected abuse/misuse computational resources, criminal activity, or cyber-security violations.
Note that the above applies even to project PIs. In general, the OLCF will not overwrite existing UNIX permissions on data files owned by project members for the purpose of granting access to the project PI. Project PIs should work closely with project members throughout the duration of the project to ensure UNIX permissions are set appropriately.
Software
Software Licensing
All software used on OLCF computers must be appropriately acquired and used according to the appropriate software license agreement. Possession, use, or transmission of illegally obtained software is prohibited. Likewise, users shall not copy, store, or transfer copyrighted software, except as permitted by the owner of the copyright. Only export-controlled codes approved by the Export Control Office may be run by parties with sensitive data agreements.
Malicious Software
Users must not intentionally introduce or use malicious software, including but not limited to, computer viruses, Trojan horses, or computer worms.
Reconstruction of Information or Software
Users are not permitted to reconstruct information or software for which they are not authorized. This includes but is not limited to any reverse engineering of copyrighted software or firmware present on OLCF computing resources.


2. Data Management

(Back to Top)

OLCF users have many options for data storage. Each user has a series of user-affiliated storage spaces, and each project has a series of project-affiliated storage spaces where data can be shared for collaboration. The storage areas are mounted across all OLCF systems, making your data available to you from multiple locations.

A Storage Area for Every Activity
The storage area to use in any given situation depends upon the activity you wish to carry out. Each User has a User Home area on a Network File System (NFS) and a User Archive area on the archival High Performance Storage System (HPSS). User storage areas are intended to house user-specific files. Individual Projects have a Project Home area on NFS, multiple Project Work areas on Lustre, and a Project Archive area on HPSS. Project storage areas are intended to house project-centric files.
Simple Guidelines
The following sections contain a description of all available storage areas and relevant details for each. If you're the impatient type, you can probably get right to work by adhering to the following simple guidelines:
If you need to store... then use... at path...
Long-term data for routine access that is unrelated to a project User Home $HOME
Long-term data for archival access that is unrelated to a project User Archive /home/$USER
Long-term project data for routine access that's shared with other project members Project Home /ccs/proj/[projid]
Short-term project data for fast, batch-job access that you don't want to share Member Work $MEMBERWORK/[projid]
Short-term project data for fast, batch-job access that's shared with other project members Project Work $PROJWORK/[projid]
Short-term project data for fast, batch-job access that's shared with those outside your project World Work $WORLDWORK/[projid]
Long-term project data for archival access that's shared with other project members Project Archive /proj/[projid]


2.1. User-Centric Data Storage

(Back to Top)

Users are provided with several storage areas, each of which serve different purposes. These areas are intended for storage of data for a particular user and not for storage of project data. The following table summarizes user-centric storage areas available on OLCF resources and lists relevant polices.

User-Centric Storage Areas
Area Path Type Permissions Quota Backups Purged Retention
User Home $HOME NFS User-controlled 10 GB Yes No 90 days
User Archive /home/$USER HPSS User-controlled 2 TB [1] No No 90 days
[1] In addition, there is a quota/limit of 2,000 files on this directory.


2.1.1. User Home Directories (NFS)

(Back to Top)

Each user is provided a home directory to store frequently used items such as source code, binaries, and scripts.

User Home Path
Home directories are located in a Network File Service (NFS) that is accessible from all OLCF resources as /ccs/home/$USER. The environment variable $HOME will always point to your current home directory. It is recommended, where possible, that you use this variable to reference your home directory. In cases in which using $HOME is not feasible, it is recommended that you use /ccs/home/$USER. Users should note that since this is an NFS-mounted filesystem, its performance will not be as high as other filesystems.
User Home Quotas
Quotas are enforced on user home directories. To request an increased quota, contact the OLCF User Assistance Center. To view your current quota and usage, use the quota command:
$ quota -Qs
Disk quotas for user usrid (uid 12345):
     Filesystem  blocks   quota   limit   grace   files   quota   limit   grace
nccsfiler1a.ccs.ornl.gov:/vol/home
                  4858M   5000M   5000M           29379   4295m   4295m
User Home Backups
If you accidentally delete files from your home directory, you may be able to retrieve them. Online backups are performed at regular intervals. Hourly backups for the past 24 hours, daily backups for the last 7 days, and 1 weekly backup are available. It is possible that the deleted files are available in one of those backups. The backup directories are named hourly.*, daily.* , and weekly.* where * is the date/time stamp of the backup. For example, hourly.2016-12-01-0905 is an hourly backup made on December 1, 2016 at 9:05 AM. The backups are accessed via the .snapshot subdirectory. Note that if you do an ls (even with the -a option) of any directory you won’t see a .snapshot subdirectory, but you’ll be able to do “ls .snapshot” nonetheless. This will show you the hourly/daily/weekly backups available. The .snapshot feature is available in any subdirectory of your home directory and will show the online backup of that subdirectory. In other words, you don’t have to start at /ccs/home/$USER and navigate the full directory structure; if you’re in a /ccs/home subdirectory several “levels” deep, an “ls .snapshot” will access the available backups of that subdirectory.
User Home Permissions
The default permissions for user home directories are 0750 (full access to the user, read and execute for the group). Users have the ability to change permissions on their home directories, although it is recommended that permissions be set to as restrictive as possible (without interfering with your work).
Special User Website Directory
User Home spaces may contain a directory named /www. If this directory exists, and if appropriate permissions exist, files in that directory will be accessible via the World Wide Web at http://users.nccs.gov/~user (where user is your userid).


2.1.2. User Archive Directories (HPSS)

(Back to Top)

Users are also provided with user-centric archival space on the High Performance Storage System (HPSS). User archive areas on HPSS are intended for storage of data not immediately needed in either User Home directories (NFS) or User Work directories (Lustre®). User Archive areas also serve as a location for users to store backup copies of user files. User Archive directories should not be used to store project-related data. Rather, Project Archive directories should be used for project data.

User Archive Path
User archive directories are located at /home/$USER.
User Archive Access
User archive directories may be accessed only via specialized tools called HSI and HTAR. For more information on using HSI or HTAR, see the HSI and HTAR page.
User Archive Accounting
Each file and directory on HPSS is associated with an HPSS storage allocation. For information on storage allocation, please visit the Understanding HPSS Storage Allocations page.


2.2. Project-Centric Data Storage

(Back to Top)

Projects are provided with several storage areas for the data they need. Project directories provide members of a project with a common place to store code, data files, documentation, and other files related to their project. While this information could be stored in one or more user directories, storing in a project directory provides a common location to gather all files. The following table summarizes project-centric storage areas available on OLCF resources and lists relevant policies.

Project-Centric Storage Areas
Area Path Type Permissions Quota Backups Purged Retention
Project Home /ccs/proj/[projid] NFS 770 50 GB Yes No 90 days
Member Work $MEMBERWORK/[projid] Lustre® 700 [1] 10 TB No 14 days 14 days
Project Work $PROJWORK/[projid] Lustre® 770 100 TB No 90 days 90 days
World Work $WORLDWORK/[projid] Lustre® 775 10 TB No 90 days 90 days
Project Archive /proj/[projid] HPSS 770 100 TB [2] No No 90 days
Important! Files within "Work" directories (i.e., Member Work, Project Work, World Work) are not backed up and are purged on a regular basis according to the timeframes listed above.

[1] Permissions on Member Work directories can be controlled to an extent by project members. By default, only the project member has any accesses, but accesses can be granted to other project members by setting group permissions accordingly on the Member Work directory. The parent directory of the Member Work directory prevents accesses by "UNIX-others" and cannot be changed (security measures).

[2] In addition, there is a quota/limit of 100,000 files on this directory.


2.2.1. Project Home Directories (NFS)

(Back to Top)

Projects are provided with a Project Home storage area in the Network File Service (NFS) mounted filesystem. This area is intended for storage of data, code, and other files that are of interest to all members of a project. Since Project Home is an NFS-mounted filesystem, its performance will not be as high as other filesystems.

Project Home Path
Project Home area is accessible at /ccs/proj/abc123 (where abc123 is your project ID).
Project Home Quotas
To check your project's current usage, run df -h /ccs/proj/abc123 (where abc123 is your project ID). Quotas are enforced on project home directories. The current limit is shown on the Storage Policy page. To request an increased quota, contact the User Assistance Center.
Project Home Backups
If you accidentally delete files from your project home directory, you may be able to retrieve them. Online backups are performed at regular intervals. Hourly backups for the past 24 hours, daily backups for the last 7 days, and 1 weekly backup are available. It is possible that the deleted files are available in one of those backups. The backup directories are named hourly.*, daily.* , and weekly.* where * is the date/time stamp of the backup. For example, hourly.2016-12-01-0905 is an hourly backup made on December 1, 2016 at 9:05 AM. The backups are accessed via the .snapshot subdirectory. Note that if you do an ls (even with the -a option) of any directory you won’t see a .snapshot subdirectory, but you’ll be able to do “ls .snapshot” nonetheless. This will show you the hourly/daily/weekly backups available. The .snapshot feature is available in any subdirectory of your project home directory and will show the online backup of that subdirectory. In other words, you don’t have to start at /ccs/proj/abc123 and navigate the full directory structure; if you’re in a /ccs/proj subdirectory several “levels” deep, an “ls .snapshot” will access the available backups of that subdirectory.
Project Home Permissions
The default permissions for project home directories are 0770 (full access to the user and group). The directory is owned by root and the group is the project's group. All members of a project should also be members of that group-specific project. For example, all members of project "ABC123" should be members of the "abc123" UNIX group.


2.2.2. Project-Centric Work Directories

(Back to Top)

To provide projects and project members with high-performance storage areas that are accessible to batch jobs, projects are given (3) distinct project-centric work (i.e., scratch) storage areas within Spider, the OLCF's center-wide Lustre® filesystem.

Three Project Work Areas to Facilitate Collaboration
To facilitate collaboration among researchers, the OLCF provides (3) distinct types of project-centric work storage areas: Member Work directories, Project Work directories, and World Work directories. Each directory should be used for storing files generated by computationally-intensive HPC jobs related to a project. The difference between the three lies in the accessibility of the data to project members and to researchers outside of the project. Member Work directories are accessible only by an individual project member by default. Project Work directories are accessible by all project members. World Work directories are readable by any user on the system.
Paths
Paths to the various project-centric work storage areas are simplified by the use of environment variables that point to the proper directory on a per-user basis:
  • Member Work Directory: $MEMBERWORK/[projid]
  • Project Work Directory: $PROJWORK/[projid]
  • World Work Directory: $WORLDWORK/[projid]
Environment variables provide operational staff (aka "us") flexibility in the exact implementation of underlying directory paths, and provide researchers (aka "you") with consistency over the long-term. For these reasons, we highly recommend the use of these environment variables for all scripted commands involving work directories.
Permissions
UNIX Permissions on each project-centric work storage area differ according to the area's intended collaborative use. Under this setup, the process of sharing data with other researchers amounts to simply ensuring that the data resides in the proper work directory.
  • Member Work Directory: 700
  • Project Work Directory: 770
  • World Work Directory: 775
For example, if you have data that must be restricted only to yourself, keep them in your Member Work directory for that project (and leave the default permissions unchanged). If you have data that you intend to share with researchers within your project, keep them in the project's Project Work directory. If you have data that you intend to share with researchers outside of a project, keep them in the project's World Work directory.
Quotas
Soft quotas are enforced on project-centric work directories. The current limit is shown on the Storage Policy page. To request an increased quota, contact the User Assistance Center.
Backups
Member Work, Project Work, and World Work directories are not backed up. Project members are responsible for backing up these files, either to Project Archive areas (HPSS) or to an off-site location.


2.2.3. Project Archive Directories (HPSS)

(Back to Top)

Projects are also allocated project-specific archival space on the High Performance Storage System (HPSS). The default quota is shown on the Storage Policy page. If a higher quota is needed, contact the User Assistance Center. The Project Archive space on HPSS is intended for storage of data not immediately needed in either Project Home (NFS) areas nor Project Work (Lustre®) areas, and to serve as a location to store backup copies of project-related files.

Project Archive Path

The project archive directories are located at /proj/pjt000 (where pjt000 is your Project ID).

Project Archive Access

Project Archive directories may only be accessed via utilities called HSI and HTAR. For more information on using HSI or HTAR, see the HSI and HTAR page.

Project Archive Accounting

Each file and directory on HPSS is associated with an HPSS storage allocation. For information on HPSS storage allocations, please visit the Understanding HPSS Storage Allocations page.


3. Transferring Data

(Back to Top)

OLCF users are provided with several options for transferring data among systems at the OLCF as well as between the OLCF and other sites.

Data Transfer Nodes

Dedicated data transfer nodes are provided to OLCF users and are accessible via the load-balancing hostname dtn.ccs.ornl.gov. The nodes have been tuned specifically for wide area data transfers, and also perform well on the local area. They are recommended for data transfers as they will, in most cases, improve transfer speed and help decrease load on computational systems’ login nodes. More information on these nodes can be found on the Data Transfer Nodes page.

Local Transfers

The OLCF provides a shared-storage environment, so transferring data between our machines is largely unnecessary. However, we provide tools both to move large amounts of data between scratch and archival storage and from one scratch area to another. More information can be found on the local transfers page.

Remote Transfers

The OLCF provides several tools for moving data between computing centers or between OLCF machines and local user workstations. The following tools are primarily designed for transfers over the internet, and aren't recommended for use transferring data between OLCF machines. The following table summarizes options for remote data transfers:
GridFTP + GridCert GridFTP + SSH SFTP/SCP BBCP
Data Security insecure (default) / secure (w/configuration) insecure (default) / secure (w/configuration) secure insecure (unsuited for sensitive projects)
Authentication Passcode Passcode Passcode Passcode
Transfer speed Fast Fast Slow Fast
Required Infrastructure GridFTP server at remote site GridFTP server at remote site Comes standard with SSH install BBCP installed at remote site

GridFTP

GridFTP is a high-performance data transfer protocol based on FTP and optimized for high-bandwidth wide-area networks. It is typically used to move large amounts of data between the OLCF and other majors centers. Globus is a kind of GridFTP that provides a web user-interface for initiating, managing, and monitoring GridFTP transfers between endpoints. An endpoint is the logical address of a resource or filesystem attached to a Globus Connect GridFTP server. Many institutions host their own shared Globus Connect Servers and endpoints. However, it is possible to turn any non-commercial private resource into an endpoint using the Globus Connect Personal client. Globus can also be scripted. More information for GridFTP can be found on the GridFTP page. More information for Globus can be found on the Globus page.
SFTP and SCP
The SSH-based SFTP and SCP utilities can be used to transfer files to and from OLCF systems. Because these utilities can be slow, we recommend using them only to transfer limited numbers of small files. More information on these utilities can be found on the SFTP and SCP page.

BBCP

For larger files, the multi-streaming transfer utility BBCP is recommended. The BBCP utility is capable of breaking up your transfer into multiple simultaneously transferring streams, thereby transferring data much faster than single-streaming utilities such as SCP and SFTP. Note: BBCP is not secure, but is much faster than SFTP. More information can be found on the BBCP page.


3.1. Employing Data Transfer Nodes

(Back to Top)

The OLCF provides nodes dedicated to data transfer that are available via dtn.ccs.ornl.gov. These nodes have been tuned specifically for wide-area data transfers, and also perform well on local-area transfers. The OLCF recommends that users employ these nodes for data transfers, since in most cases transfer speed improves and load on computational systems' login and service nodes is reduced.

Filesystems Accessible from DTNs

All OLCF filesystems -- the NFS-backed User Home and Project Home areas, the Lustre®-backed User Work and Project Work areas, and the HPSS-backed User Archive and Project Archive areas -- are accessible to users via the DTNs. For more information on available filesystems at the OLCF see the Data Management Overview page.

Interactive DTN Access

Members of allocated projects are automatically given access to the data transfer nodes. The interactive nodes are accessible for direct login through the dtn.css.ornl.gov alias.

Batch DTN Access

Batch data transfer nodes can be accessed through the Torque/MOAB queuing system on the dtn.ccs.ornl.gov interactive node. The DTN batch nodes are also accessible from the Titan, Eos, and Rhea batch systems through remote job submission. This is accomplished by the command qsub -q host script.pbs which will submit the file script.pbs to the batch queue on the specified host. This command can be inserted at the end of an existing batch script in order to automatically trigger work on another OLCF resource.
Note: DTNs can help you manage your allocation hours efficiently by preventing billable compute resources from sitting idle.
The following scripts show how this technique could be employed. Note that only the first script, retrieve.pbs, would need to be manually submitted; the others will trigger automatically from within the respective batch scripts.
Example Workflow Using DTNs in Batch Mode
The first batch script, retrieve.pbs, retrieves data needed by a compute job. Once the data has been retrieved, the script submits a different batch script, compute.pbs, to run computations on the retrieved data. To run this script start on Titan or Rhea.
qsub -q dtn retrieve.pbs
$ cat retrieve.pbs

  # Batch script to retrieve data from HPSS via DTNs

  # PBS directives
  #PBS -A PROJ123
  #PBS -l walltime=8:00:00

  # Retrieve required data
  cd $MEMBERWORK/proj123 
  hsi get largedatfileA
  hsi get largedatafileB

  # Verification code could go here

  # Submit other batch script to execute calculations on retrieved data
  qsub -q titan compute.pbs

$
The second batch script is submitted from the first to carry out computational work on the data. When the computational work is finished, the batch script backup.pbs is submitted to archive the resulting data.
$ cat compute.pbs

  # Batch script to carry out computation on retrieved data

  # PBS directives
  #PBS -l walltime=24:00:00 
  #PBS -l nodes=10000
  #PBS -A PROJ123
  #PBS -l gres=atlas1 # or atlas2

  # Launch executable
  cd $MEMBERWORK/proj123 
  aprun -n 160000 ./a.out

  # Submit other batch script to transfer resulting data to HPSS
  qsub -q dtn backup.pbs

$
The final batch script is submitted from the second to archive the resulting data soon after creation to HPSS via the hsi utility.
$ cat backup.pbs

  # Batch script to back-up resulting data

  # PBS directives
  #PBS -A PROJ123
  #PBS -l walltime=8:00:00

  # Store resulting data 
  cd $MEMBERWORK/proj123 
  hsi put largedatfileC
  hsi put largedatafileD

$
Some items to note:
  • Batch jobs submitted to the dtn partition will be executed on a DTN that is accessible exclusively via batch submissions. These batch-accessible DTNs have identical configurations to those DTNs that are accessible interactively; the only difference between the two is accessibility.
  • The DTNs are not currently a billable resource, i.e., the project specified in a batch job targeting the dtn partition will not be charged for time spent executing in the dtn partition.

Scheduled DTN Queue

  • The walltime limit for jobs submitted to the dtn partition is 24 hours.
  • Users may request a maximum of 4 nodes per batch job.
  • There is a limit of (2) eligible-to-run jobs per user.
  • Jobs in excess of the per user limit above will be placed into a held state, but will change to eligible-to-run when appropriate.
  • The queue allows each user a maximum of 6 running jobs.


3.1.1. Transferring Data with BBCP

(Back to Top)

For moving larger files, the multistreaming transfer utility BBCP is recommended. The BBCP utility is capable of breaking up your transfer into multiple simultaneously transferring streams, thereby transferring data much faster than single-streaming utilities such as scp and sftp. Before you can use the BBCP utility, it must be installed on both the local and remote systems. It is currently available on each OLCF system and should be a part of each user’s default environment. Several pre-compiled binaries as well as the source can be downloaded from the Stanford Linear Accelerator Center (SLAC) BBCP page.

Installation from Source Tips

  • Refer to your operating system's documentation to satisfy missing dependencies or problems in your build environment.
  • Clone the BBCP source code git repository from the Stanford Linear Accelerator Center (SLAC) BBCP page.
  • $ git clone http://www.slac.stanford.edu/~abh/bbcp/bbcp.git
    
  • From within the decompressed BBCP directory, run make. This should build the BBCP executable into the created bin directory. The build has been tested on Linux-based systems and should build with few or no modifications. If you system’s uname command does not return Linux or Darwin, you may need to modify the Makefile.
  • $ cd bbcp/src
    $ uname
    Linux
    $ make
    

Common variable modifications

  • In MakeSname, the test command is hard coded to /usr/bin/test. If this is not the location of test on your system, you can change the following line to the correct path (which test should return the path to test on your system):
  • if /usr/bin/test -${1} $2; then
    
  • If the uname command is not in /bin on your system, change the uname variable in the MakeSname file. You will also need to change the following line in the file Makefile:
  • @cd src;$(MAKE) make`/bin/uname` OSVER=`../MakeSname`
    
  • If the libz.a library is not located at /usr/local/lib/libz.a on your system, change the libzMakefile file.
  • The file Makefile contains compiler and compiler flag options for the BBCP build. You can change the compilers and flags by modifying variables in this file. For example, to change the compilers used on a Linux system, modify the variables LNXCC and LNXcc.

Usage

To transfer the local file /local/path/largefile.tar to the remote system remotesystem as /remote/path/largefile.tar, use the following:
bbcp -P 2 -V -w 8m -s 16 /local/path/largefile.tar remotesystem:/remote/path/largefile.tar
where -P 2 produces progress messages every 2 seconds. -V produces verbose output, including detailed transfer-speed statistics. -w 8m sets the size of the disk input/output (I/O) buffers. -s 16 sets the number of parallel network streams to 16. BBCP assumes the remote system’s non-interactive environment contains the path to the BBCP utility. This can be determined with the following command:
ssh remotesystem which bbcp
If this is not the case, the -T BBCP option can be used to specify how to start BBCP on the remote system. For example, you could use the following:
bbcp -P 2 -V -w 8m -s 16 -T 'ssh -x -a -oFallBackToRsh=no %I -l %U %H /remote/path/to/bbcp' /local/path/largefile.tar remotesystem:/remote/path/largefile.tar
Often, during large transfers the connection between the transferring systems is lost. The -a option gives BBCP the ability to pick up where it left off. For example, you could use the following:
bbcp -k -a /remotesystem/homedir/.bbcp/ -P 2 -V -w 8m -s 16 /local/path/largefile.tar remotesystem:/remote/path/largefile.tar
To transfer an entire directory tree, use the following:
bbcp -r -P 2 -V -w 8m -s 16 /local/path/* remotesystem:/remote/path
We strongly recommend that you use the Data Transfer Nodes when transferring files to and from the OLCF. If you are, however, connecting directly to systems such as the Cray XK, it is necessary to specify a particular node as the destination host because the host name (i.e. titan.ccs.ornl.gov) actually points to a server load-balancing device that returns node addresses in a round-robin fashion. For example, you could use the following:
bbcp -r -P 2 -V -w 8m -s 16 /local/path/* titan-login3.ccs.ornl.gov:/remote/path
You may encounter an error similar to the following:
bbcp: Accept timed out on port 5031
bbcp: Unable to allocate more than 0 of 8 data streams.
Killed by signal 15.
If this happens, add the -z option to your bbcp command. This tells bbcp to use the "reverse connection protocol" and can be helpful when a transfer is being blocked by a firewall.

Further Reading

More information on BBCP can be found by typing "bbcp -h” on OLCF systems as well as on the Stanford Linear Accelerator Center (SLAC) BBCP page.


3.1.2. GridFTP at OLCF

(Back to Top)

GridFTP is a high-performance data transfer protocol based on FTP and optimized for high-bandwidth wide-area networks. The OLCF provides the Globus (https://www.globus.org/) implementation of GridFTP which enables scalable, fast, secure, and robust network data transfer. The Globus GridFTP suite provides the following interfaces for managing data transfer

  • Globus Online Web-UI
  • Globus Online cli.globusonline.org command line interface
  • globus_url_copy shell command

Installing GridFTP

    Prior to using GridFTP, it must be installed on both the client and server. Installation is independent of the authentication method chosen. GridFTP is currently available on each OLCF system and can be added to your environment using the globus module:
  $ module load globus
If your site does not already have GridFTP available, it can be downloaded from Globus. Download and installation information can be found on the Globus Toolkit Documentation site.

Usage

GridFTP may use either SecurID/OAuth or SSH publickeys for authentication to OLCF resources. The GridFTP interface and usage details differ based on which authentication method is employed.

SecurID/OAuth Authentication

Globus transfers can be authenticated using your normal OLCF username and SecurID token. Your ordinary OLCF credentials are used to generate temporary X.509 grid certificates on an OLCF credential management server. These temporary certificates are delegated to Globus and the GridFTP command line tools for use by the underlying GridFTP transfer protocol. Please see GridFTP with Globus Online for details about using your existing OLCF credentials to authenticate GridFTP transfers.

SSH Authentication

Please see the GridFTP with SSH page for examples an instructions about how to use ssh authentication with the globus_url_copy interface.


3.1.2.1. GridFTP with Globus
(Back to Top)

Using Globus

Globus provides a web user-interface for initiating, managing, and monitoring GridFTP transfers between endpoints. An endpoint is the logical address of a resource or filesystem attached to a Globus Connect GridFTP server. Many institutions host their own shared Globus Connect Servers and endpoints. However, it is possible to turn any non-commercial private resource into an endpoint using the Globus Connect Personal client.

Transferring Data with Globus Online

In the browser of your choice, visit https://www.globus.org.
  1. Log in to Globus with your Globus identity. Most new Globus users should create a traditional stand-alone Globus ID by following the “Sign up” link. Returning OLCF users should generally follow the "Globus ID" link to login.
    Only users who have an ORNL UCAMS institutional ID may choose "Oak Ridge National Lab" from the dropdown menu.
    See the Globus accounts FAQ for more information.
    Login to the Globus webapp

    Login to the Globus webapp using an existing Globus ID, linked institutional ID, or new Globus account.

  2. Once logged in to the Globus webapp, select "Manage Data" and then "Transfer Files" from blue menu bar.
    Choose Transfer Files from navigation menu

    Choose Transfer Tiles from the navigation menu.

  3. Enter an OLCF endpoint into one of the two Endpoint fields. Using the endpoint selection dialog that appears, enter OLCF Atlas (or the alternate name olcf#dtn_atlas) to choose the OLCF as an endpoint.
    Select Endpoint

    Select an OLCF endpoint in one of the two available fields.

    Workflows established prior to February 2016 may have used the now-discontinued olcf#dtn endpoint. This endpoint should no longer be used. Questions about migrating legacy workflows can be directed to help@olcf.ornl.gov.
  4. Globus must request permission from you via the OLCF to access your files. Press the Continue button. You will be redirected to our authentication page at "myproxy*.ccs.ornl.gov". Enter your OLCF username in the "username" box and your OLCF SecurID passcode in the "Passcode" box. Upon success, you will be returned to the Globus web interface.
    OLCF OAuth page

    Activating an endpoint using only your OLCF credentials requires a browser to authenticate the OAuth request from Globus.

    The endpoint lifetime is 72 hours. If the endpoint authentication expires before the transfer is complete, the transfer will automatically resume the next time you reactivate the endpoint.
  5. Enter the path to your data (for example /tmp/work/username) in the "path: window. Soon you should see a list of the files in that directory appear in the window below.
  6. Repeat this process in the other endpoint window with the endpoint at the other end of your transfer.
  7. Select the files you want to transfer by clicking on them. Use the arrows in the middle of the page to do the transfer.Screen Shot 2013-06-06 at 1.48.05 PM
  8. Globus will give you a message at the top of the page about the status of your transfer and send you an email when your transfer is complete.

Reactivating an Expired Endpoint

If the endpoint or proxy expires before the transfer is complete, the transfer will automatically resume the next time you activate the endpoint. To reactivate an expired endpoint, choose "Manage Endpoints" from the Globus web interface. Select the OLCF endpoint you wish to reactivate and choose the "Activate" tab. Press the "Reactivate Now" button and enter your OLCF credentials to approve the request by Globus to access your account.
Reactivate endpoint

Reactivate an endpoint under the Manage Endpoints section.

Globus Online Command Line Interface

Globus Online also provides a scriptable command-line interface available via SSH at cli.globusonline.org using your Globus account credentials. Complete information about cli.globusonline.org can be found in the official Globus documentation. To use the CLI interface, you must first generate an SSH public-private key pair on the host from which you will use the interface. From a terminal, call
$ ssh-keygen -t rsa -b 4096 -C "Globus key on $HOST" -f $HOME/.ssh/id_rsa.globus
It is highly recommended that all your SSH keys are protected by passphrases. Passphrase-protected keys can be used in conjunction with an SSH-agent for convenience.
Compromised passphrase-less keys linked to Globus will allow read-write access to all of your activated endpoints.
Add the public key to your Globus ID's list of authorized keys. From the web-UI, click on the Account menu link, then choose "manage SSH and X.509 keys", then "Add a New Key". Give the key any alias, choose the SSH Public Key Type, paste the full contents of $HOME/.ssh/id_rsa.globus.pub into the body field and click "Add Key". To use the interface, start an SSH session as your globus ID username with
$ ssh -i $HOME/.ssh/id_rsa.globus ${GLOBUS_UNAME}@cli.globusonline.org
This command will place you into an interactive console from which globus transfer management commands can be issued. Calling help will list all of the available commands. Full documentation for each command is available through man $COMMAND. By encapsulating the SSH invocation into a shell function or alias and using an SSH-agent or passphrase-less key, it is possible to write convenient shell scripts for managing Globus transfers. The script below uses the Globus Online Tutorial Endpoints go#ep1 and go#ep2, which are available to all Globus users for practice, to demonstrate basic operations.
#!/bin/bash
#
# This script demos the simplest way to automate Globus Online transfers
# using the ssh://cli.globusonline.org interface.
#
#==============================================================================

# Edit these as needed for individual use
PUBKEY="$HOME/.ssh/id_rsa.globus"
GLOBUS_UNAME="FIXME"

# Function to simplify remote Globus command invocations.
gocli() {
  ssh -i ${PUBKEY} ${GLOBUS_UNAME}@cli.globusonline.org "$@"
}

# Print available commands. Man pages can be read by starting an interactive
# session over ssh using `ssh ${GLOBUS_UNAME}@cli.globusonline.org`
gocli help

# Activate the endpoints.
# endpoint-activate returns 0 if active or successfully activated.
# Some endpoints may be interactively activated, but not the OLCF's.
# It is a good practice to exit the script on activation problems when the
# script is run non-interactively.
# TODO - Add a trap or timeout if this script is run non-interactively
#        against endpoints that can be interactively activated.
gocli endpoint-activate "go#ep1"
[ "$?" = 1 ] && exit 1

gocli endpoint-activate "go#ep2"
[ "$?" = 1 ] && exit 1

# Make destination dirs - this is not strictly necessary, just showing off
# `mkdir`.
if ! $(gocli ls "go#ep2/~/simple_demo" > /dev/null 2>&1); then
  gocli mkdir "go#ep2/~/simple_demo"
fi

# List the SRC and DST folder contents.
gocli ls -l "go#ep1/share/godata"
gocli ls -l "go#ep2/~/simple_demo"

# Bulk file transfer:
# Constuct array of INPUTLINE(s) from ls output on src dir:
DATA_FILES=( $(gocli ls "go#ep1/share/godata/*.txt") )
{
for i in "${!DATA_FILES[@]}"; do
  f="${DATA_FILES[$i]}"
  echo "go#ep1/share/godata/$f go#ep2/~/simple_demo/bulk/$f"
done
# Pipe array into transfer command.
} | gocli transfer -s 3 --label="scripted_bulk_xfer_demo"

# Recursive transfer:
gocli transfer -s 3 --label="scripted_recursive_xfer_demo" -- \
  "go#ep1/share/godata/" \
  "go#ep2/~/simple_demo/recursive/" -r

# Print the status of the last two transfers:
# See `gocli man status` for format options to make parsing easier.
gocli status -a -l 2

3.1.2.2. GridFTP with SSH Authentication
(Back to Top)

Configuring GridFTP with SSH Authentication

    No setup is required if you will be using SSH for authentication. However, to use this for remote transfers, the remote facility must also accept SSH for authentication.

Transferring Data

    Files are transferred using the globus-url-copy command. The arguments to that command will differ based on your authentication method. To use globus-url-copy with SSH authentication load the globus module:
      $ module load globus
    Then run the globus-url-copy command:
    For example, while on an OLCF resource, you can transfer file1 in your OLCF User Work area to file2 on a remote system:
     $ globus-url-copy -tcp-bs 12M -bs 12M -p 4 -v -vb file:/lustre/atlas/scratch/$USER/file1 sshftp://user@remote.system/remote/dir/file2
    From the OLCF, transfer file1 on a remote system to file2 in your User Work area:
     $ globus-url-copy -tcp-bs 12M -bs 12M -p 4 -v -vb sshftp://remote.system/remote/dir/file1 file:/lustre/atlas/scratch/$USER/file2
    From remote system, transfer file1 on a remote system to file2 in your User Work area:
     $ globus-url-copy -tcp-bs 12M -bs 12M -p 4 -v -vb file:/remote/dir/file1 sshftp://userid@dtn.ccs.ornl.gov/lustre/atlas/scratch/$USER/file2

SSH Keys

    SSH keys can not be used to grant passwordless access to OLCF resources. However, SSH keys can be created for OLCF systems to use for access to remote resources that support ssh keys.
    To create an SSH key pair for dtn.ccs.ornl.gov:
    Log in to dtn.ccs.ornl.gov and go to your .ssh directory or create a .ssh directory if you do not have one.
    $ssh-keygen -t dsa
    Enter passphrase (empty for no passphrase): 
    Enter same passphrase again: 
    Your identification has been saved in /ccs/home/$USER/.ssh/id_dsa.
    Your public key has been saved in /ccs/home/$USER/.ssh/id_dsa.pub.
    
    This will generate a ssh key pair consisting of id_dsa.pub and id_dsa. If you choose not to enter a passphrase, anyone who gains access to your private key file will then be able to assume your identity.
    To cash the private key and passphrase so that you do not need to enter it for every ssh operation in your session:
    $ ssh-agent
    $ ssh-add
    Identity added: /ccs/home/$USER/.ssh/id_rsa (/ccs/home/$USER/.ssh/id_rsa)
    Enter passphrase for /ccs/home/$USER/.ssh/id_dsa: 
    Identity added: /ccs/home/$USER/.ssh/id_dsa (/ccs/home/$USER/.ssh/id_dsa)
    Copy your id_dsa.pub to the remote resource's .ssh directory and copy the contents into a file called "authorized_keys".
    To test if this was successful, ssh to the remote resource from dtn.ccs.ornl.gov. If your ssh key works you will not need to enter your password for the remote resource.


3.1.3.2. Transferring Data with SFTP and SCP

(Back to Top)

The SSH-based SCP and SFTP utilities can be used to transfer files to and from NCCS systems. Because these utilities can be slow, we recommend using them to transfer limited numbers of smaller files.

Usage

Both SCP and SFTP are available on all NCCS systems and should be a part of each user’s environment. For example, on a UNIX-based system, to transfer the file oldfile from your local system to your $HOME directory on OLCF systems as newfile, you would use one of the following commands:
SFTP
  sftp userid@dtn.ccs.ornl.gov
  sftp> put oldfile newfile
  sftp> bye
SCP
  scp ./oldfile userid@dtn.ccs.ornl.gov:~/newfile
...where userid is your given NCCS username. Standard file transfer protocol (FTP) and remote copy (RCP) should not be used to transfer files to the NCCS high-performance computing (HPC) systems due to security concerns. SCP works with NCCS systems only if your per-process initialization files produce no output. The means that files such as .cshrc, .kshrc, .profile, etc. must not issue any commands that write to standard output. If you would like for this file to write to standard output for interactive sessions, you must edit the file so that it does so only for interactive sessions. For sh-type shells such as bash and ksh use the following template:
  TTY=$( /usr/bin/tty )
    if [ $? = 0 ]; then
      /usr/bin/echo "interactive stuff goes here"
    fi
For c-shell-type shells such as csh and tcsh use:
  ( /usr/bin/tty ) > /dev/null
    if ( $status == 0 ) then
      /usr/bin/echo "interactive stuff goes here"
    endif


3.2. Data Transfer Between OLCF Systems

(Back to Top)

The OLCF provides a shared-storage environment, so transferring data between our machines is largely unnecessary. However, we provide tools both to move large amounts of data between scratch and archival storage and from one scratch area to another.

Intra-Filesystems Transfers

The following utilities can be used to transfer data between partitions of the filesystem.
Utility Amount of Data to Transfer Where to run utility Notes
cp < 500 GB command line / batch script Useful when transferring small amounts of data, available from Titan and the DTNs
bbcp < 500 GB DTN command line or batch script Multi-streaming ability can make bbcp a faster option than cp, should be executed from DTNs only
fcp > 500 GB batch script Faster than cp and bbcp for directory trees, can preserve lustre striping, only available from the batch scheduled DTNs
For large transfers, transfers to the High Performance Storage System (discussed below), and when running fcp or bbcp, the DTNs should be used to prevent overloading the compute system's login nodes.
To help reduce load on the compute systems' login resources, please use the DTNs when using fcp or bbcp, or transferring more than ~500GB.

Transfers to the High Performance Storage System

Since all areas of the shared luster filesystem are periodically swept, moving data from lustre to archival storage on High performance storage system is a necessary part of most users' work flow.  The commands hsi and htar provide users with easy-to-use interfaces to their User Archive and Project Archive spaces on the OLCF’s HPSS-based archival storage system. The Transferring Data with HSI and HTAR page contains examples of hsi and htar and sample workflows.

3.2.1. The High-Performance Storage System (HPSS)

(Back to Top)

The High Performance Storage System (HPSS) at the OLCF provides longer-term storage for the large amounts of data created on the OLCF compute systems. The mass storage facility consists of tape and disk storage components, servers, and the HPSS software. All data persists on disk for a period after it is uploaded that is determined by how full the disk caches become. When data is migrated to tape it is done so in a first in first out fashion.

Accessing HPSS
Each OLCF user receives an HPSS account automatically. Users can transfer data to HPSS from any OLCF system using the HSI or HTAR utilities. Initially, data transferred to HPSS is written to disk; the system then migrates the data to tape for longer-term archival. Click here for our HPSS Best Practice Guide and examples and instructions for using HTAR and HSI.
HPSS Hardware
HPSS hasSL8500 tape libraries, each holding up to 10,000 cartridges. The libraries house a total of (24) T10K-A tape drives (500 GB cartridges, uncompressed) and (60) T10K-B tape drives (1 TB cartridges, uncompressed), (36) T10K-C, and (72) T10K-D. Each drive has a bandwidth of 250 MB/s.
HPSS History
ORNL’s work in mass storage began in the early 1990s to support the Atmospheric Radiation Measurement project and to provide storage for simulation results generated on the NCCS’s Paragon supercomputers. To support those projects, ORNL acquired and ran the NSL UniTree storage management product. In 1993 a follow-on to NSL UniTree, known as HPSS, was being designed by IBM and a collaboration of Department of Energy (DOE) national laboratories (Sandia, Livermore, and Los Alamos). ORNL joined that collaboration and took on responsibility for the storage system management (SSM) portion of the product, for which the ORNL HPSS development team continues to be responsible. ORNL continued with NSL UniTree production use until 1997, at which time the conversion to HPSS was completed. In 1997, HPSS won an R&D 100 Award based on an entry initiated and prepared at ORNL. As storage, network, and computing technologies continue to change the OLCF's storage system evolves to take advantage of new equipment that is both more capable and more cost-effective.


3.2.2. HPSS Data Transfer

(Back to Top)

HPSS Best Practices
Currently HSI and HTAR are offered for archiving data into HPSS or retrieving data from the HPSS archive. For optimal transfer performance we recommend sending file of 768 GB or larger to HPSS. The minimum file size that we recommend sending is 512 MB. HPSS will handle files between 0K and 512 MB, but write and read performance will be negatively affected. For files smaller than 512 MB we recommend bundling them with HTAR to achieve an archive file of at least 512 MB. When retrieving data from a tar archive larger than 1 TB, we recommend that you pull only the files that you need rather than the full archive. Examples of this will be give in the htar section below. If you are using HSI to retrieve an single file larger than 1 TB please make sure that the stripe pattern that you choose is approprate for this file's size. See the "Choosing a Stripe Pattern" section of the Lustre® Basics page to learn how and why choosing the right striping pattern is important. We also recommend using our data transfer nodes (DTNs) for achieving the fastest possible transfer rates. This can be done by logging on to dtn.ccs.ornl.gov and initiating transfers interactively or by submitting a batch job from any OLCF resource to the DTNs as described in the HSI and HTAR Workflow section.
Using HSI
Issuing the command hsi will start HSI in interactive mode. Alternatively, you can use:
  hsi [options] command(s)
...to execute a set of HSI commands and then return. To list you files on the HPSS, you might use:
  hsi ls
hsi commands are similar to ftp commands. For example, hsi get and hsi put are used to retrieve and store individual files, and hsi mget and hsi mput can be used to retrieve multiple files. To send a file to HPSS, you might use:
  hsi put a.out
To put a file in a pre-existing directory on hpss:
  hsi “cd MyHpssDir; put a.out”
To retrieve one, you might use:
  hsi get /proj/projectid/a.out
Warning: If you are using HSI to retrieve an single file larger than 1 TB please make sure that the stripe pattern that you choose is approprate for this file's size. See the "Choosing a Stripe Pattern" section of "Choosing a Stripe Pattern" to learn how and why.
Here is a list of commonly used hsi commands.
Command Function
cd Change current directory
get, mget Copy one or more HPSS-resident files to local files
cget Conditional get - get the file only if it doesn't already exist
cp Copy a file within HPSS
rm mdelete Remove one or more files from HPSS
ls List a directory
put, mput Copy one or more local files to HPSS
cput Conditional put - copy the file into HPSS unless it is already there
pwd Print current directory
mv Rename an HPSS file
mkdir Create an HPSS directory
rmdir Delete an HPSS directory
 
Additional HSI Documentation
There is interactive documentation on the hsi command available by running:
  hsi help
Additionally, documentation can be found at the Gleicher Enterprises website, including an HSI Reference Manual and man pages for HSI.
Using HTAR
The htar command provides an interface very similar to the traditional tar command found on UNIX systems. It is used as a command-line interface. The basic syntax of htar is:
htar -{c|K|t|x|X} -f tarfile [directories] [files]
As with the standard Unix tar utility the -c, -x, and -t options, respectively, function to create, extract, and list tar archive files. The -K option verifies an existing tarfile in HPSS and the -X option can be used to re-create the index file for an existing archive. For example, to store all files in the directory dir1 to a file named allfiles.tar on HPSS, use the command:
  htar -cvf allfiles.tar dir1/*
To retrieve these files:
  htar -xvf allfiles.tar 
htar will overwrite files of the same name in the target directory. When possible, extract only the files you need from large archives. To display the names of the files in the project1.tar archive file within the HPSS home directory:
  htar -vtf project1.tar
To extract only one file, executable.out, from the project1 directory in the Archive file called project1.tar:
  htar -xm -f project1.tar project1/ executable.out 
To extract all files from the project1/src directory in the archive file called project1.tar, and use the time of extraction as the modification time, use the following command:
  htar -xm -f project1.tar project1/src
HTAR Limitations
The htar utility has several limitations.
Apending data
You cannot add or append files to an existing archive.
File Path Length
File path names within an htar archive of the form prefix/name are limited to 154 characters for the prefix and 99 characters for the file name. Link names cannot exceed 99 characters.
Size
There are limits to the size and number of files that can be placed in an HTAR archive.
Individual File Size Maximum 68GB, due to POSIX limit
Maximum Number of Files per Archive 1 million
  For example, when attempting to HTAR a directory with one member file larger that 64GB, the following error message will appear:

[titan-ext1]$htar -cvf hpss_test.tar hpss_test/

INFO: File too large for htar to handle: hpss_test/75GB.dat (75161927680 bytes)
ERROR: 1 oversize member files found - please correct and retry
ERROR: [FATAL] error(s) generating filename list 
HTAR: HTAR FAILED
Additional HTAR Documentation
The HTAR user's guide can be found at the Gleicher Enterprises website Gleicher Enterprises website, including the HTAR man page.
HSI and HTAR Workflow
Transfers with the HPSS should be launched from the external Titan login nodes, the interactive data transfer nodes (dtns), or the batch-accessible dtns. If the file size is above 512 MB and HSI is initiated from titan-ext, or titan-batch nodes the HSI-DTN will transfer files in a further optimized and stripped method. Batch dtns should be used for large long-running transfers or for transfers that are part of a scripted workflow. To submit a data archival job from any OLCF resource use the -q dtn option with qsub.
qsub -q dtn Batch-script.pbs
Your allocation will not be charged time for this job. Below is an example batch script using HTAR. Batch-script.pbs
#PBS -l walltime=0:30:00
#PBS -l nodes=1
#PBS -A PRJ123
#PBS -l gres=atlas1%atlas2

# Launch exectuable
cd $MEMBERWORK/prj123
htar -cf /proj/prj123/viz_output.htar viz_output/
htar -cf /proj/prj123/compute_data.htar compute_data/
See the workflow documentation for more workflow examples.
Storage Locations
Users are provided with a User Archive directory on HPSS that is located at /home/userid (where userid is your User ID). Additionally, each project is given a Project Archive directory located at /proj/projectid (where projectid is the six-character project ID).
A Note on Bundling Data
HPSS is optimized for larger files, so if you have multiple files that are smaller than 2GB, you should combine them and store a single, larger file. In most cases, this will provide a faster transfer and it will allow HPSS to store the data more efficiently. The HTAR command is very useful for bundling smaller files, and is often faster than using the conventional tar command and then transferring via HSI. HTAR has an individual file size limit of 64GB, due to the POSIX tar specification. However, HTAR can be used to store and retrieve directories that are in total large than 64GB, provided that they do not contain any individual files large than 64GB. When retrieving a large number of files, if HSI knows there are many files needed, it can bundle retrieves. This method allows HPSS to gather needed files on a single tape and perform fewer mount/seeks/rewind/unmounts. For example: The following will create a list of files and pass the list to HPSS to retrieve. Note that this method does not preserve directory structure and is better used when directory structure is not needed:
echo "get << EOFMARKER" > dir0.lst
hsi -q find testdir -type f >>& dir0.lst
echo "EOFMARKER" >> dir0.lst
hsi "out dir0.out ; in dir0.lst"
Classes of  Service and Data Redundancy
The HPSS has several Classes of Service (COS) to ensure that files are efficiently stored based on their size. The COS is set automatically based on the size of the file that is being stored.
COS ID Name based on filesize # Tapes
11 NCCS 0MB<16MB 3 copies
12 NCCS 16MB<8GB RAIT 2+1
13 NCCS 8GB<1TB RAIT 4+1
14 NCCS >1TB RAIT 4+1
  For files less than 16 MB in size, three copies are written to tape. For files 16MB or greater in size, HPSS supports a Redundant Array of Independent Tapes (RAIT) so there is no need to use multiple copies to ensure file safety in the event of tape failure. Neither multiple-copies nor RAIT will protect your data if you accidentally delete it. Therefore avoid hsi rm */*.


3.2.3. Using File Globbing Wildcard Characters with HSI

(Back to Top)

HSI has the option to turn file globbing on and off. If you get this message:

  O:[/home/user]: ls -la file*
  *** hpss_Lstat: No such file or directory [-2: HPSS_ENOENT]
    /home/user/file*
...then you'll need to turn on HSI file globbing with the glob command:
  O:[/home/user]: glob
  filename globbing turned on

  O:[/home/user]: ls -la file*
  -rw-r--r--   1 user  users     6164480 Jan 14 10:36 file.tar
  -rw-r--r--   1 user  users     6164480 Jan  6 11:08 file1.tar.gz
  -rw-r--r--   1 user  users     6164480 Jan  6 11:08 file2.tar
  -rw-r--r--   1 user  users     6164480 Jan  6 11:09 file3.tar
  -rw-r--r--   1 user  users     6164480 Jan  6 11:09 file4.tar
  -rw-r--r--   1 user  users     6164480 Jan  6 11:09 file5.tar


4. Enabling Workflows through Cross-System Batch Submission

(Back to Top)

The OLCF now supports submitting jobs between OLCF systems via batch scripts. This can be useful for automatically triggering analysis and storage of large data sets after a successful simulation job has ended, or for launching a simulation job automatically once the input deck has been retrieved from HPSS and pre-processed.

Cross-Submission allows jobs on one OLCF resource to submit new jobs to other OLCF resources.

Cross-Submission allows jobs on one OLCF resource to submit new jobs to other OLCF resources.

The key to remote job submission is the command qsub -q host script.pbs which will submit the file script.pbs to the batch queue on the specified host. This command can be inserted at the end of an existing batch script in order to automatically trigger work on another OLCF resource. This feature is supported on the following hosts:
Host Remote Submission Command
Rhea qsub -q rhea visualization.pbs
Eos qsub -q eos visualization.pbs
Titan qsub -q titan compute.pbs
Data Transfer Nodes (DTNs) qsub -q dtn retrieve_data.pbs
Example Workflow 1: Automatic Post-Processing
The simplest example of a remote submission workflow would be automatically triggering an analysis task on Rhea at the completion of a compute job on Titan. This workflow would require two batch scripts, one to be submitted on Titan, and a second to be submitted automatically to Rhea. Visually, this workflow may look something like the following:
Post-processing Workflow
The batch scripts for such a workflow could be implemented as follows: Batch-script-1.pbs
#PBS -l walltime=0:30:00
#PBS -l nodes=1
#PBS -A PRJ123
#PBS -l gres=atlas1%atlas2

# Retrieve data from HPSS
cd $MEMBERWORK/prj123
htar -xf /proj/prj123/compute_data.htar compute_data/

# Submit compute job to Titan
qsub -q titan Batch-script-2.pbs
Batch-script-2.pbs
#PBS -l walltime=2:00:00
#PBS -l nodes=4096
#PBS -A PRJ123
#PBS -l gres=atlas1%atlas2

# Launch exectuable
cd $MEMBERWORK/prj123
aprun -n 65536 ./analysis-task.exe

# Submit data archival job to DTNs
qsub -q dtn Batch-script-3.pbs
Batch-script-3.pbs
#PBS -l walltime=0:30:00
#PBS -l nodes=1
#PBS -A PRJ123
#PBS -l gres=atlas1%atlas2

# Launch exectuable
cd $MEMBERWORK/prj123
htar -cf /proj/prj123/viz_output.htar viz_output/
htar -cf /proj/prj123/compute_data.htar compute_data/
The key to this workflow is the qsub -q batch@rhea-batch Batch-script-2.pbs command, which tells qsub to submit the file Batch-script-2.pbs to the batch queue on Rhea.
Initializing the Workflow
We can initialize this workflow in one of two ways:
  • Log into dtn.ccs.ornl.gov and run qsub Batch-script-1.pbs OR
  • From Titan or Rhea, run qsub -q dtn Batch-script-1.pbs
Example Workflow 2: Data Staging, Compute, and Archival
Now we give another example of a linear workflow. This example shows how to use the Data Transfer Nodes (DTNs) to retrieve data from HPSS and stage it to your project's scratch area before beginning. Once the computation is done, we will automatically archive the output.
Post-processing Workflow
Batch-script-1.pbs
#PBS -l walltime=0:30:00
#PBS -l nodes=1
#PBS -A PRJ123
#PBS -l gres=atlas1%atlas2

# Retrieve Data from HPSS
cd $MEMBERWORK/prj123
htar -xf /proj/prj123/input_data.htar input_data/

# Launch compute job
qsub -q titan Batch-script-2.pbs
Batch-script-2.pbs
#PBS -l walltime=6:00:00
#PBS -l nodes=4096
#PBS -A PRJ123
#PBS -l gres=atlas1%atlas2

# Launch exectuable
cd $MEMBERWORK/prj123
aprun -n 65536 ./analysis-task.exe

# Submit data archival job to DTNs
qsub -q dtn Batch-script-3.pbs
Batch-script-3.pbs
#PBS -l walltime=0:30:00
#PBS -l nodes=1
#PBS -A PRJ123
#PBS -l gres=atlas1%atlas2

# Launch exectuable
cd $MEMBERWORK/prj123
htar -cf /proj/prj123/viz_output.htar viz_output/
htar -cf /proj/prj123/compute_data.htar compute_data/
Initializing the Workflow
We can initialize this workflow in one of two ways:
  • Log into dtn.ccs.ornl.gov and run qsub Batch-script-1.pbs OR
  • From Titan or Rhea, run qsub -q dtn Batch-script-1.pbs
Example Workflow 3: Data Staging, Compute, Visualization, and Archival
This is an example of a "branching" workflow. What we will do is first use Rhea to prepare a mesh for our simulation on Titan. We will then launch the compute task on Titan, and once this has completed, our workflow will branch into two separate paths: one to archive the simulation output data, and one to visualize it. After the visualizations have finished, we will transfer them to a remote institution.
Post-processing Workflow
Step-1.prepare-data.pbs
#PBS -l walltime=0:30:00
#PBS -l nodes=10
#PBS -A PRJ123
#PBS -l gres=atlas1%atlas2

# Prepare Mesh for Simulation
mpirun -n 160 ./prepare-mesh.exe

# Launch compute job
qsub -q titan Step-2.compute.pbs
Step-2.compute.pbs
#PBS -l walltime=6:00:00
#PBS -l nodes=4096
#PBS -A PRJ123
#PBS -l gres=atlas1%atlas2

# Launch exectuable
cd $MEMBERWORK/prj123
aprun -n 65536 ./analysis-task.exe

# Workflow branches at this stage, launching 2 separate jobs

# - Launch Archival task on DTNs
qsub -q dtn@dtn-batch Step-3.archive-compute-data.pbs

# - Launch Visualization task on Rhea
qsub -q rhea Step-4.visualize-compute-data.pbs
Step-3.archive-compute-data.pbs
#PBS -l walltime=0:30:00
#PBS -l nodes=1
#PBS -A PRJ123
#PBS -l gres=atlas1%atlas2

# Archive compute data in HPSS
cd $MEMBERWORK/prj123
htar -cf /proj/prj123/compute_data.htar compute_data/
Step-4.visualize-compute-data.pbs
#PBS -l walltime=2:00:00
#PBS -l nodes=64
#PBS -A PRJ123
#PBS -l gres=atlas1%atlas2

# Visualize Compute data
cd $MEMBERWORK/prj123
mpirun -n 768 ./visualization-task.py

# Launch transfer task
qsub -q dtn Step-5.transfer-visualizations-to-campus.pbs
Step-5.transfer-visualizations-to-campus.pbs
#PBS -l walltime=2:00:00
#PBS -l nodes=1
#PBS -A PRJ123
#PBS -l gres=atlas1%atlas2

# Transfer visualizations to storage area at home institution
cd $MEMBERWORK/prj123
SOURCE=gsiftp://dtn03.ccs.ornl.gov/$MEMBERWORK/visualization.mpg
DEST=gsiftp://dtn.university-name.edu/userid/visualization.mpg
globus-url-copy -tcp-bs 12M -bs 12M -p 4 $SOURCE $DEST
Initializing the Workflow
We can initialize this workflow in one of two ways:
  • Log into rhea.ccs.ornl.gov and run qsub Step-1.prepare-data.pbs OR
  • From Titan or the DTNs, run qsub -q rhea Step-1.prepare-data.pbs
Checking Job Status
Host Remote qstat Remote showq
Rhea qstat -a @rhea-batch showq --host=rhea-batch
Eos qstat -a @eos-batch showq --host=eos-batch
Titan qstat -a @titan-batch showq --host=titan-batch
Data Transfer Nodes (DTNs) qstat -a @dtn-batch showq --host=dtn-batch
Deleting Remote Jobs
In order to delete a job (say, job number 18688) from a remote queue, you can do the following
Host Remote qdel
Rhea qdel 18688@rhea-batch
Eos qdel 18688@eos-batch
Titan qdel 18688@titan-batch
Data Transfer Nodes (DTNs) qdel 18688@dtn-batch
Potential Pitfalls
The OLCF advises users to keep their remote submission workflows simple, short, and mostly linear. Workflows that contain many layers of branches, or that trigger many jobs at once, may prove difficult to maintain and debug. Workflows that contain loops or recursion (jobs that can submit themselves again) may inadvertently waste allocation hours if a suitable exit condition is not reached.
Recursive workflows which do not exit will drain your project's allocation. Refunds will not be granted. Please be extremely cautious when designing workflows that cause jobs to re-submit themselves.
Circular Workflow
As always, users on multiple projects are strongly advised to double check that the #PBS -A <PROJECTID> field is set to the correct project prior to submission. This will ensure that resource usage is associated with the intended project.