titan

Up since 11/8/17 02:45 pm

eos

Up since 11/14/17 11:20 pm

rhea

Up since 10/17/17 05:40 pm

hpss

Up since 11/20/17 09:15 am

atlas1

Up since 11/15/17 07:25 am

atlas2

Up since 11/27/17 10:45 am
OLCF User Assistance Center

Can't find the information you need below? Need advice from a real person? We're here to help.

OLCF support consultants are available to respond to your emails and phone calls from 9:00 a.m. to 5:00 p.m. EST, Monday through Friday, exclusive of holidays. Emails received outside of regular support hours will be addressed the next business day.

HPSS Data Transfer

See this article in context within the following user guides: Data
HPSS Best Practices

Currently HSI and HTAR are offered for archiving data into HPSS or retrieving data from the HPSS archive.

For optimal transfer performance we recommend sending file of 768 GB or larger to HPSS. The minimum file size that we recommend sending is 512 MB. HPSS will handle files between 0K and 512 MB, but write and read performance will be negatively affected. For files smaller than 512 MB we recommend bundling them with HTAR to achieve an archive file of at least 512 MB.

When retrieving data from a tar archive larger than 1 TB, we recommend that you pull only the files that you need rather than the full archive. Examples of this will be give in the htar section below.

If you are using HSI to retrieve an single file larger than 1 TB please make sure that the stripe pattern that you choose is approprate for this file’s size. See the “Choosing a Stripe Pattern” section of the Lustre® Basics page to learn how and why choosing the right striping pattern is important.

We also recommend using our data transfer nodes (DTNs) for achieving the fastest possible transfer rates. This can be done by logging on to dtn.ccs.ornl.gov and initiating transfers interactively or by submitting a batch job from any OLCF resource to the DTNs as described in the HSI and HTAR Workflow section.

Using HSI

Issuing the command hsi will start HSI in interactive mode. Alternatively, you can use:

  hsi [options] command(s)

…to execute a set of HSI commands and then return.

To list you files on the HPSS, you might use:

  hsi ls

hsi commands are similar to ftp commands. For example, hsi get and hsi put are used to retrieve and store individual files, and hsi mget and hsi mput can be used to retrieve multiple files.

To send a file to HPSS, you might use:

  hsi put a.out

To put a file in a pre-existing directory on hpss:

  hsi “cd MyHpssDir; put a.out”

To retrieve one, you might use:

  hsi get /proj/projectid/a.out
Warning: If you are using HSI to retrieve an single file larger than 1 TB please make sure that the stripe pattern that you choose is approprate for this file’s size. See the “Choosing a Stripe Pattern” section of “Choosing a Stripe Pattern” to learn how and why.

Here is a list of commonly used hsi commands.

Command Function
cd Change current directory
get, mget Copy one or more HPSS-resident files to local files
cget Conditional get – get the file only if it doesn’t already exist
cp Copy a file within HPSS
rm mdelete Remove one or more files from HPSS
ls List a directory
put, mput Copy one or more local files to HPSS
cput Conditional put – copy the file into HPSS unless it is already there
pwd Print current directory
mv Rename an HPSS file
mkdir Create an HPSS directory
rmdir Delete an HPSS directory

 

Additional HSI Documentation

There is interactive documentation on the hsi command available by running:

  hsi help

Additionally, documentation can be found at the Gleicher Enterprises website, including an HSI Reference Manual and man pages for HSI.

Using HTAR

The htar command provides an interface very similar to the traditional tar command found on UNIX systems. It is used as a command-line interface. The basic syntax of htar is:

htar -{c|K|t|x|X} -f tarfile [directories] [files]

As with the standard Unix tar utility the -c, -x, and -t options, respectively, function to create, extract, and list tar archive files. The -K option verifies an existing tarfile in HPSS and the -X option can be used to re-create the index file for an existing archive.

For example, to store all files in the directory dir1 to a file named allfiles.tar on HPSS, use the command:

  htar -cvf allfiles.tar dir1/*

To retrieve these files:

  htar -xvf allfiles.tar 

htar will overwrite files of the same name in the target directory.

When possible, extract only the files you need from large archives.

To display the names of the files in the project1.tar archive file within the HPSS home directory:

  htar -vtf project1.tar

To extract only one file, executable.out, from the project1 directory in the Archive file called project1.tar:

  htar -xm -f project1.tar project1/ executable.out 

To extract all files from the project1/src directory in the archive file called project1.tar, and use the time of extraction as the modification time, use the following command:

  htar -xm -f project1.tar project1/src
HTAR Limitations

The htar utility has several limitations.

Apending data

You cannot add or append files to an existing archive.

File Path Length

File path names within an htar archive of the form prefix/name are limited to 154 characters for the prefix and 99 characters for the file name. Link names cannot exceed 99 characters.

Size

There are limits to the size and number of files that can be placed in an HTAR archive.

Individual File Size Maximum 68GB, due to POSIX limit
Maximum Number of Files per Archive 1 million

 

For example, when attempting to HTAR a directory with one member file larger that 64GB, the following error message will appear:


[titan-ext1]$htar -cvf hpss_test.tar hpss_test/

INFO: File too large for htar to handle: hpss_test/75GB.dat (75161927680 bytes)
ERROR: 1 oversize member files found - please correct and retry
ERROR: [FATAL] error(s) generating filename list 
HTAR: HTAR FAILED
Additional HTAR Documentation


The HTAR user’s guide can be found at the Gleicher Enterprises website Gleicher Enterprises website, including the HTAR man page.

HSI and HTAR Workflow

Transfers with the HPSS should be launched from the external Titan login nodes, the interactive data transfer nodes (dtns), or the batch-accessible dtns.

If the file size is above 512 MB and HSI is initiated from titan-ext, or titan-batch nodes the HSI-DTN will transfer files in a further optimized and stripped method.

Batch dtns should be used for large long-running transfers or for transfers that are part of a scripted workflow.

To submit a data archival job from any OLCF resource use the -q dtn option with qsub.

qsub -q dtn Batch-script.pbs

Your allocation will not be charged time for this job.

Below is an example batch script using HTAR.

Batch-script.pbs

#PBS -l walltime=0:30:00
#PBS -l nodes=1
#PBS -A PRJ123
#PBS -l gres=atlas1%atlas2

# Launch exectuable
cd $MEMBERWORK/prj123
htar -cf /proj/prj123/viz_output.htar viz_output/
htar -cf /proj/prj123/compute_data.htar compute_data/

See the workflow documentation for more workflow examples.

Storage Locations

Users are provided with a User Archive directory on HPSS that is located at /home/userid (where userid is your User ID). Additionally, each project is given a Project Archive directory located at /proj/projectid (where projectid is the six-character project ID).

A Note on Bundling Data

HPSS is optimized for larger files, so if you have multiple files that are smaller than 2GB, you should combine them and store a single, larger file. In most cases, this will provide a faster transfer and it will allow HPSS to store the data more efficiently.

The HTAR command is very useful for bundling smaller files, and is often faster than using the conventional tar command and then transferring via HSI. HTAR has an individual file size limit of 64GB, due to the POSIX tar specification. However, HTAR can be used to store and retrieve directories that are in total large than 64GB, provided that they do not contain any individual files large than 64GB.

When retrieving a large number of files, if HSI knows there are many files needed, it can bundle retrieves. This method allows HPSS to gather needed files on a single tape and perform fewer mount/seeks/rewind/unmounts. For example:

The following will create a list of files and pass the list to HPSS to retrieve. Note that this method does not preserve directory structure and is better used when directory structure is not needed:

echo "get << EOFMARKER" > dir0.lst
hsi -q find testdir -type f >>& dir0.lst
echo "EOFMARKER" >> dir0.lst
hsi "out dir0.out ; in dir0.lst"
Classes of  Service and Data Redundancy

The HPSS has several Classes of Service (COS) to ensure that files are efficiently stored based on their size. The COS is set automatically based on the size of the file that is being stored.

COS ID Name based on filesize # Tapes
11 NCCS 0MB<16MB 3 copies
12 NCCS 16MB<8GB RAIT 2+1
13 NCCS 8GB<1TB RAIT 4+1
14 NCCS >1TB RAIT 4+1

 

For files less than 16 MB in size, three copies are written to tape. For files 16MB or greater in size, HPSS supports a Redundant Array of Independent Tapes (RAIT) so there is no need to use multiple copies to ensure file safety in the event of tape failure.

Neither multiple-copies nor RAIT will protect your data if you accidentally delete it. Therefore avoid hsi rm */*.