HPSS Data Transfer
Categories: Data Management, Data Transfer
Print this article
HPSS Best Practices
Currently HSI and HTAR are offered for archiving data into HPSS or retrieving data from the HPSS archive.
For optimal transfer performance we recommend sending file of 768 GB or larger to HPSS. The minimum file size that we recommend sending is 512 MB. HPSS will handle files between 0K and 512 MB, but write and read performance will be negatively affected. For files smaller than 512 MB we recommend bundling them with HTAR to achieve an archive file of at least 512 MB.
When retrieving data from a tar archive larger than 1 TB, we recommend that you pull only the files that you need rather than the full archive. Examples of this will be give in the htar section below.
If you are using HSI to retrieve an single file larger than 1 TB please make sure that the stripe pattern that you choose is approprate for this file’s size. See the “Choosing a Stripe Pattern” section of the Lustre® Basics page to learn how and why choosing the right striping pattern is important.
We also recommend using our data transfer nodes (DTNs) for achieving the fastest possible transfer rates. This can be done by logging on to
dtn.ccs.ornl.gov and initiating transfers interactively or by submitting a batch job from any OLCF resource to the DTNs as described in the HSI and HTAR Workflow section.
Issuing the command
hsi will start HSI in interactive mode. Alternatively, you can use:
hsi [options] command(s)
…to execute a set of HSI commands and then return.
To list you files on the HPSS, you might use:
hsi commands are similar to
ftp commands. For example,
hsi get and
hsi put are used to retrieve and store individual files, and
hsi mget and
hsi mput can be used to retrieve multiple files.
To send a file to HPSS, you might use:
hsi put a.out
To put a file in a pre-existing directory on hpss:
hsi “cd MyHpssDir; put a.out”
To retrieve one, you might use:
hsi get /proj/projectid/a.out
Here is a list of commonly used hsi commands.
|cd||Change current directory|
|get, mget||Copy one or more HPSS-resident files to local files|
|cget||Conditional get – get the file only if it doesn’t already exist|
|cp||Copy a file within HPSS|
|rm mdelete||Remove one or more files from HPSS|
|ls||List a directory|
|put, mput||Copy one or more local files to HPSS|
|cput||Conditional put – copy the file into HPSS unless it is already there|
|pwd||Print current directory|
|mv||Rename an HPSS file|
|mkdir||Create an HPSS directory|
|rmdir||Delete an HPSS directory|
Additional HSI Documentation
There is interactive documentation on the
hsi command available by running:
htar command provides an interface very similar to the traditional
tar command found on UNIX systems. It is used as a command-line interface. The basic syntax of
As with the standard Unix
tar utility the
-t options, respectively, function to create, extract, and list tar archive files. The
-K option verifies an existing tarfile in HPSS and the
-X option can be used to re-create the index file for an existing archive.
For example, to store all files in the directory
dir1 to a file named
allfiles.tar on HPSS, use the command:
htar -cvf allfiles.tar dir1/*
To retrieve these files:
htar -xvf allfiles.tar
htar will overwrite files of the same name in the target directory.
When possible, extract only the files you need from large archives.
To display the names of the files in the
project1.tar archive file within the HPSS home directory:
htar -vtf project1.tar
To extract only one file,
executable.out, from the
project1 directory in the Archive file called
htar -xm -f project1.tar project1/ executable.out
To extract all files from the
project1/src directory in the archive file called
project1.tar, and use the time of extraction as the modification time, use the following command:
htar -xm -f project1.tar project1/src
htar utility has several limitations.
You cannot add or append files to an existing archive.
File Path Length
File path names within an
htar archive of the form prefix/name are limited to 154 characters for the prefix and 99 characters for the file name. Link names cannot exceed 99 characters.
There are limits to the size and number of files that can be placed in an HTAR archive.
|Individual File Size Maximum||68GB, due to POSIX limit|
|Maximum Number of Files per Archive||1 million|
For example, when attempting to HTAR a directory with one member file larger that 64GB, the following error message will appear:
[titan-ext1]$htar -cvf hpss_test.tar hpss_test/ INFO: File too large for htar to handle: hpss_test/75GB.dat (75161927680 bytes) ERROR: 1 oversize member files found - please correct and retry ERROR: [FATAL] error(s) generating filename list HTAR: HTAR FAILED
Additional HTAR Documentation
HSI and HTAR Workflow
Transfers with the HPSS should be launched from the external Titan login nodes, the interactive data transfer nodes (dtns), or the batch-accessible dtns.
If the file size is above 512 MB and HSI is initiated from titan-ext, or titan-batch nodes the HSI-DTN will transfer files in a further optimized and stripped method.
Batch dtns should be used for large long-running transfers or for transfers that are part of a scripted workflow.
To submit a data archival job from any OLCF resource use the -q dtn option with qsub.
qsub -q dtn Batch-script.pbs
Your allocation will not be charged time for this job.
Below is an example batch script using HTAR.
#PBS -l walltime=0:30:00 #PBS -l nodes=1 #PBS -A PRJ123 #PBS -l gres=atlas1%atlas2 # Launch exectuable cd $MEMBERWORK/prj123 htar -cf /proj/prj123/viz_output.htar viz_output/ htar -cf /proj/prj123/compute_data.htar compute_data/
Users are provided with a User Archive directory on HPSS that is located at
userid is your User ID). Additionally, each project is given a Project Archive directory located at
projectid is the six-character project ID).
A Note on Bundling Data
HPSS is optimized for larger files, so if you have multiple files that are smaller than 2GB, you should combine them and store a single, larger file. In most cases, this will provide a faster transfer and it will allow HPSS to store the data more efficiently.
The HTAR command is very useful for bundling smaller files, and is often faster than using the conventional
tar command and then transferring via HSI. HTAR has an individual file size limit of 64GB, due to the POSIX tar specification. However, HTAR can be used to store and retrieve directories that are in total large than 64GB, provided that they do not contain any individual files large than 64GB.
When retrieving a large number of files, if HSI knows there are many files needed, it can bundle retrieves. This method allows HPSS to gather needed files on a single tape and perform fewer mount/seeks/rewind/unmounts. For example:
The following will create a list of files and pass the list to HPSS to retrieve. Note that this method does not preserve directory structure and is better used when directory structure is not needed:
echo "get << EOFMARKER" > dir0.lst hsi -q find testdir -type f >>& dir0.lst echo "EOFMARKER" >> dir0.lst hsi "out dir0.out ; in dir0.lst"
Classes of Service and Data Redundancy
The HPSS has several Classes of Service (COS) to ensure that files are efficiently stored based on their size. The COS is set automatically based on the size of the file that is being stored.
|COS ID||Name based on filesize||# Tapes|
|11||NCCS 0MB<16MB||3 copies|
|12||NCCS 16MB<8GB||RAIT 2+1|
|13||NCCS 8GB<1TB||RAIT 4+1|
|14||NCCS >1TB||RAIT 4+1|
For files less than 16 MB in size, three copies are written to tape. For files 16MB or greater in size, HPSS supports a Redundant Array of Independent Tapes (RAIT) so there is no need to use multiple copies to ensure file safety in the event of tape failure.
Neither multiple-copies nor RAIT will protect your data if you accidentally delete it. Therefore avoid
hsi rm */*.