Transferring Data with HSI and HTAR
Categories: Data Management, Data Transfer
Print this article
htar provide users with easy-to-use interfaces to their User Archive and Project Archive spaces on the OLCF’s HPSS-based archival storage system.
hsi utility allows automatic authentication and provides a user-friendly command line and interactive interface to HPSS. HSI is the preferred method of accessing HPSS. Features of HSI include:
- Password security: passwords are not transmitted in clear text over the network.
- Usability in pipelines and shell scripts and in batch jobs.
- Support for command stacking (multiple commands per line).
- Support for interactive or one-liner (i.e., command-line-only) modes.
- Support for abbreviations for most commands and keywords.
- Support for recursion for many common commands (such as those for storing, retrieving, and listing files).
- Extensive online full-screen help.
Issuing the command
hsi will start HSI in interactive mode. Alternatively, you can use:
hsi [options] command(s)
…to execute a set of HSI commands and then return. Note that you may need to add
/opt/public/bin to your search path to find the HSI executable.
hsi commands are similar to
ftp commands. For example,
hsi get and
hsi put are used to retrieve and store individual files, and
hsi mget and
hsi mput can be used to retrieve multiple files.
To send a file to HPSS, you might use:
hsi put a.out
To retrieve one, you might use:
hsi get /proj/projectid/a.out
htar command provides an interface very similar to the traditional
tar command found on UNIX systems. It is used as a command-line interface. For example, to store all files in the directory
dir1 to a file named
allfiles.tar on HPSS, use a command similar to:
htar -cvf allfiles.tar dir1/*
Users are provided with a User Archive directory on HPSS that is located at
userid is your User ID). Additionally, each project is given a Project Archive directory located at
projectid is the six-character project ID).
There is interactive documentation on the
hsi command available by running:
Transfer File Sizes
HPSS is optimized for larger files, so if you have multiple files that are smaller than 2GB, you should combine them and store a single, larger file. In most cases, this will provide a faster transfer and it will allow HPSS to store the data more efficiently. The HTAR command is very useful for doing this, and is often faster than using the conventional
tar command and then transferring via HSI.
Ideal HSI and HTAR transfer limits are as follows:
|File Size||Logins||Puts||Concurrent Sessions|
|2GB – 256GB||< 500 a day||< 500 a day||< 3|
hsiin excess of the guidelines listed in the table above may result in HPSS account termination or delays.
Setting the Number of Copies
Our HPSS supports up to (2) tape copies for files. By default files are written with only (1) copy. For very critical files that have no backup elsewhere and cannot be easily re-created, you may want to store these files employing (2) copies.
You can specify (2) copies by issuing the
copies=2 command before any put statements in
hsi or as part of your htar command line, as shown below.
For non-interactive HSI (i.e., in a batch script):
<batch script> ... hsi "copies=2; put test.file" ... $ <run batch script> put 'test.file' : '/home/username/test.file' ( 56420696 bytes, 30566.2 KBS (cos=6003)) $
For interactive HSI (i.e., within HSI interactive mode):
$ hsi O:[/home/username]: copies=2 O:[/home/username]: put test.file put 'test.file' : '/home/
/test.file' ( 56420696 bytes, 58627.9 KBS (cos=6003)) O:[/home/username]: exit $
$ htar -H copies=2 -cf test.tar . HTAR: HTAR SUCCESSFUL $
Verifying the Number of Copies
Copies in HPSS are controlled through Classes of Service (COS). Each COS is either a (1)-copy COS or a (2)-copy COS. You can check the COS of a file by issuing the
ls -UH command:
O:[/home/username]: ls -UH test.file.tar Mode Links Owner Group COS Acct Where Size DateTime Entry -rw-r--r-- 1 $USER users 6057 act001 DISK 50064568320 May 16 2008 test.file.tar
The above example shows that the
test.file.tar file is in the 6057 COS. Typically, even COS are (1)-copy and odd COS are (2)-copy. You can verify if a COS is (1)-copy or (2)-copy by issuing the
lscos command within HSI:
]: lscos 10 HPSS Classes of Service defined COS Name Excl. Copies Subsys Min Size - Max Size + 1 ID Flags IDs --------------------------------------------------------------------------------------- 5081 Disk X-Small 2 ALL 0 - 131,072 5081 Disk X-Small 1 ALL 0 - 131,072 6001 Disk Small 9840 2 ALL 131,072 - 16,777,216 6001 Disk Small 9840 1 ALL 131,072 - 16,777,216 6002 Disk Medium 9840 1 ALL 16,777,216 - 536,870,912 6003 Disk Medium 2-Copy 2 ALL 16,777,216 - 536,870,912 6054 Disk Large_T 1-Copy 1 ALL 536,870,912 - 8,589,934,592 6055 Disk Large_T 2-Copy 2 ALL 536,870,912 - 8,589,934,592 6056 Disk X-Large_T 1 ALL 8,589,934,592 - 281,474,976,710,656 6057 Disk X-Large_T 2-Copy 2 ALL 8,589,934,592 - 281,474,976,710,656 --------------------------------------------------------------------------------------- Flags: U/G/A - unavailable to current uid/gid/account N - no auto assignment
From the above output, we can tell that COS 6057 is a (2)-copy COS; the copies column has a (2) in it.
Direct Transfers Between HPSS and Remote Systems
Because HSI is a third-party package, clients may be available for remote systems (e.g., your personal workstation). However, the OLCF currently supports access to the HPSS only through HSI clients on the HPC systems. To transfer data directly to or from the OLCF’s HPSS, you will need to use an OLCF resource as a staging system.
For example, to transfer data from your directory on HPSS to a system outside the OLCF, you will need to copy the data in reasonable chunks to an OLCF system using the HSI utility. Once a portion of the data is on an OLCF system, you can use a utility such as BBCP or SFTP/SCP to move the data to the system outside the OLCF.