2019 OLCF System Decommissions and Notable Changes

As the OLCF begins preparations for Frontier, you should be aware of plans that will impact Titan, Eos, Rhea, and the Atlas filesystem in 2019.

Please pay attention to the following key dates as you plan your science campaigns and data migration for the remainder of the year:

Date	Event
March 01	The OLCF will no longer accept Director’s Discretionary proposals for the Titan and Eos computational resources
March 12	Rhea’s OS upgraded to RHEL7
June 30	Last day to submit jobs to Titan and Eos
July 16	Rhea will begin the transition from Lustre to GPFS by mounting the Alpine GPFS filesystem
August 01	Titan and Eos will be decommissioned
August 05	Atlas becomes read-only. Please do not wait, begin transferring your Lustre data now.
August 06	Purge Policy implemented on Alpine, the center-wide GPFS filesystem
August 13	Rhea and the DTNs will transition a portion of the scheduled nodes from the Moab scheduler to the Slurm scheduler to aid in the Sep 03 full system transition.
August 19	Atlas decommissioned; ALL REMAINING DATA on ATLAS will be PERMANENTLY DELETED
September 03	All Rhea and DTN nodes will transition to the Slurm scheduler

Event Details

March 01, 2019 — DD projects no longer accepted for Titan and Eos

Due to the limited remaining compute time on Titan and Eos, the OLCF will no longer accept Director’s Discretion projects as of March 01, 2019.

March 12, 2019 — Rhea RHEL7 upgrade

Rhea will be upgraded to RHEL7 on March 12. Please note the following:

Recompile recommended: Due to the OS upgrade and other software changes, we recommend recompiling prior to running on the upgraded system.
FreeNX decommissioning: FreeNX will be no longer be available from Rhea following the upgrade. VNC will remain available from Rhea’s compute nodes.; More information on using VNC on OLCF systems, can be found here .
Environment Modules transition to Lmod: The environment module management system will be updated to Lmod. Lmod is a Lua-based module system for dynamically altering shell environments. Typical use of Lmod is very similar to that of interacting with modulefiles on other OLCF systems. The interface to Lmod is still provided by the module command. However; there are a number of changes, including hierarchical structure, that help interaction but should be noted.; More information on using Lmod on Rhea can be found here.
Default software version changes: Default versions of a number of software packages, including compilers, OpenMPI, and CUDA, will be changing during the upgrade. Due to the OS upgrade and software changes, users should rebuild prior to running on the upgraded system.; The list of new defaults can be found here. If you do not see a required package on the list, please notify [email protected]
Python runtime management: The python runtime will change to python/X.Y.Z-anacondaX-5.3.0. python/x.y.z-anacondax-5.3.0 should be used instead of python/x.y.z for a complete environment. Conda environments can be used for packages not already installed on system.

June 30, 2019 — Last day to submit jobs on Titan and Eos

The last day to submit batch jobs on Titan and Eos will be June 30, 2019.

July 16, 2019 — Rhea transition from Atlas to Alpine

On July 16, Rhea will begin the process of transitioning from Lustre to GPFS by mounting the new GPFS scratch filesystem, Alpine. Both Atlas and Alpine will remain available from Rhea to aid in the transition. Please note:

Data should be transferred between Atlas and Alpine from the DTN, not Rhea
GPFS requires up-to 16GB of memory per node and will reduce the max available compute node memory from 128GB to ~112 GB.
All data on Atlas will be permanently deleted on August 15.

More information on transferring data from Atlas to Alpine can be found here.

August 01, 2019 — Titan and Eos will be decommissioned

Titan and Eos will be decommissioned on August 01. On this date logins to Titan and Eos will no longer be allowed.

August 05, 2019 — Atlas becomes read-only

In preparation for Atlas’ decommission the Lustre scratch filesystem, Atlas, will become read-only from all OLCF systems on August 05. Please do not wait, begin transferring your Lustre data now.

Moving data off-site: Globus is the suggested tool to move data off-site; Standard tools such as rsync and scp can also be used through the DTN, but may be slower and require more manual intervention than Globus
Copying data directly from Atlas (Lustre) to Alpine (GPFS): Globus is the suggested tool to transfer needed data from Atlas to Alpine. Globus should be used when transfer large amounts of data.; Standard tools such as rsync and cp can also be used. The DTN mounts both filesystems and should be used when transferring with rsync and cp tools. These methods should not be used to transfer large amounts of data.
Copying data to the HPSS archive system: The hsi and htar utilities can be used to to transfer data from the Atlas filesystem to the HPSS. The tools can also be used to transfer data from the HPSS to the Alpine filesystem.; Globus is also now available to transfer data directly to the HPSS

More information on transferring data from Atlas to Alpine can be found here.

August 6, 2019 — The OLCF will begin implementing the advertised purge policy on Alpine, the new center wide GPFS filesystem

The 90-day purge policy for the Alpine GPFS filesystem will be enabled on August 6, 2019. File purging will begin on that date. On that date, any file over the purge threshold will be eligible for deletion (even if that file was created before August 6).

August 13, 2019 — Rhea and DTNs will transition a portion of scheduled nodes to the Slurm batch scheduler

To aid in the migration from Moab to Slurm, 196 Rhea compute nodes and 2 DTN scheduled nodes will migrate from the Moab scheduler to the Slurm batch scheduler.

More information can be found on the Rhea running jobs page.

Task	Moab	LSF	Slurm
View batch queue	`showq`	`jobstat`	`squeue`
Submit batch script	`qsub`	`bsub`	`sbatch`
Submit interactive batch job	`qsub -I`	`bsub -Is $SHELL`	`salloc`
Run parallel code within batch job	`mpirun`	`jsrun`	`srun`

August 19, 2019 — All remaining data on Atlas will be permanently deleted

On August 19, data remaining on the Lustre filesystem, Atlas, will no longer be accessible and will be permanently deleted . Following this date, the OLCF will no longer be able to retrieve data remaining on Atlas. Due to the large amount of data on the filesystems, we strongly urge you to start transferring your data now, and do not wait until later in the year. More information on transferring data from Atlas to Alpine can be found here.

September 03, 2019 — Rhea and DTNs will transition all scheduled nodes to the Slurm batch scheduler

All nodes on Rhea and the DTNs will migrate from the Moab scheduler to the Slurm batch scheduler.

More information can be found on the Rhea running jobs page.

Atlas to Alpine Data Transfer

Below are instructions on how to copy your files from Atlas to Alpine. We are going to mention three approaches. The first one is Globus, the second one HPSS, and the third one are the DTN servers using your own scripts. If you would like to backup your files or you don’t want your files to be transferred through the network outside the ORNL, then use HPSS. However, in many cases Globus is faster in transferring files from Atlas to Alpine.

Globus

Initially, you can see a video with instructions, and below screenshots for step by step instructions for using Globus.

Follow the steps:

Visit www.globus.org and login

Then select your organization if it is in the list or you need to create a Globus account

Search for the endpoint OLCF DTN

Declare path

Open a second panel to declare where your files would like to be transferred, select if you want an encrypt transfer, select your file/folder and click start

Then an activity report will appear and you can click on it to see the status. When the transfer is finished or failed, you will receive an email

When the transfer starts, you could close your browser, if you transfer a lot of data, it will take some time. You will receive email with the outcome. For any issue contact [email protected]

HPSS

Using Globus

The OLCF users have access to a new functionality, using Globus to transfer files to HPSS through the endpoint “OLCF HPSS“. Globus has restriction of 8 active transfers across all the users. Each user has a limit of 3 active transfers, so it is required to transfer a lot of data on each transfer than less data across many transfers. If a folder is constituted with mixed files including thousands of small files (less than 1MB each one), it would be better to tar the small files. Otherwise, if the files are larger, Globus will handle them.

To transfer the files, follow the steps which some are identical with the instructions above regarding transfer files from Atlas to Alpine:

Visit www.globus.org and login

Choose organization:

Use your credentials to login

Search for the endpoint OLCF DTN

Declare path

Open a second panel to declare where your files would like to be transferred

Declare the new endpoint called OLCF HPSS and use the appropriate path for HPSS

Select if you want an encrypt transfer, select your file/folder and click start

Then an activity report will appear and you can click on it to see the status. When the transfer is finished or failed, you will receive an email

When you click on the activity of the corresponding transfer, you can see the status or the results when it is finished

Using the tools

Currently, HSI and HTAR are offered for archiving data into HPSS or retrieving data from the HPSS archive. For better performance connect to DTN and execute the hsi/htar commands from there.

For optimal transfer performance, we recommend sending a file of 768 GB or larger to HPSS. The minimum file size that we recommend sending is 512 MB. HPSS will handle files between 0K and 512 MB, but write and read performance will be negatively affected. For files smaller than 512 MB we recommend bundling them with HTAR to achieve an archive file of at least 512 MB.

When retrieving data from a tar archive larger than 1 TB, we recommend that you pull only the files that you need rather than the full archive. Examples of this will be given in the htar section below.

Using HSI

Issuing the command hsi will start HSI in interactive mode. Alternatively, you can use:

  hsi [options] command(s)

…to execute a set of HSI commands and then return.

To list you files on the HPSS, you might use:

  hsi ls

HSI get and put commands also retrieve and store multiple files; mget and mput were
added for compatibility with ftp. In addition, both get and put can use csh-style wildcard patterns
and support recursion, so that entire directory trees can be retrieved or stored with a simple command
such as:

  hsi get -R hpss_directory

  hsi put -R local_directory

To send a file to HPSS, you might use:

  hsi put a.out

To put a file in a pre-existing directory on hpss:

  hsi “cd MyHpssDir; put a.out”

To retrieve one, you might use:

  hsi get /proj/projectid/a.out

Here is a list of commonly used hsi commands.

Command	Function
cd	Change current directory
get, mget	Copy one or more HPSS-resident files to local files
cget	Conditional get – get the file only if it doesn’t already exist
cp	Copy a file within HPSS
rm mdelete	Remove one or more files from HPSS
ls	List a directory
put, mput	Copy one or more local files to HPSS
cput	Conditional put – copy the file into HPSS unless it is already there
pwd	Print current directory
mv	Rename an HPSS file
mkdir	Create an HPSS directory
rmdir	Delete an HPSS directory

Scenario 1:

Transfer file test.tar from the path /lustre/atlas/path/ to /proj/projid while the current directory is the /lustre/atlas/path/

  hsi "cd /proj/projid; put test.tar"

Scenario 2:

Transfer file test.tar from the path /lustre/atlas/path/ to /proj/projid while the current working directory is not the /lustre/atlas/path/

  hsi "put  /lustre/atlas/path/test.tar : /proj/projid/test.tar"

Additional HSI Documentation

There is interactive documentation on the hsi command available by running:

  hsi help

Additionally, documentation can be found at the Gleicher Enterprises website, including an HSI Reference Manual and man pages for HSI.

Using HTAR

The htar command provides an interface very similar to the traditional tar command found on UNIX systems. It is used as a command-line interface. The basic syntax of htar is:

htar -{c|K|t|x|X} -f tarfile [directories] [files]

As with the standard Unix tar utility the -c, -x, and -t options, respectively, function to create, extract, and list tar archive files. The -K option verifies an existing tarfile in HPSS and the -X option can be used to re-create the index file for an existing archive.

For example, to store all files in the directory dir1 to a file named allfiles.tar on HPSS, use the command:

  htar -cvf allfiles.tar dir1/*

To retrieve these files:

  htar -xvf allfiles.tar

htar will overwrite files of the same name in the target directory.

When possible, extract only the files you need from large archives.

To display the names of the files in the project1.tar archive file within the HPSS home directory:

  htar -vtf project1.tar

To extract only one file, executable.out, from the project1 directory in the Archive file called project1.tar:

  htar -xm -f project1.tar project1/ executable.out

To extract all files from the project1/src directory in the archive file called project1.tar, and use the time of extraction as the modification time, use the following command:

  htar -xm -f project1.tar project1/src

HTAR Limitations

The htar utility has several limitations.

Apending data

You cannot add or append files to an existing archive.

File Path Length

File path names within an htar archive of the form prefix/name are limited to 154 characters for the prefix and 99 characters for the file name. Link names cannot exceed 99 characters.

Size

There are limits to the size and number of files that can be placed in an HTAR archive.

Individual File Size Maximum	68GB, due to POSIX limit
Maximum Number of Files per Archive	1 million

For example, when attempting to HTAR a directory with one member file larger that 64GB, the following error message will appear:

[titan-ext1]$htar -cvf hpss_test.tar hpss_test/

INFO: File too large for htar to handle: hpss_test/75GB.dat (75161927680 bytes)
ERROR: 1 oversize member files found - please correct and retry
ERROR: [FATAL] error(s) generating filename list 
HTAR: HTAR FAILED

Scenario 3:

The command htar fails because of the limitations of the tool, so we tar the folder and create a pipeline with hsi to created a tar file called test.tar in /proj/projid/

  tar cf - directory_to_compress | hsi "cd /proj/projid; put - : test.tar"

You can use the previously mentioned command htar -xvf to retrieve the file from Alpine, however, consider to extract only the necessary files.

Additional HTAR Documentation

The HTAR user’s guide can be found at the Gleicher Enterprises website Gleicher Enterprises website, including the HTAR man page.

DTN

You can use Data Transfer Nodes (DTN) to transfer files with scp and rsync, however, probably the other proposed approaches could be faster if you have not prepared and optimized script for efficient performance.

  ssh [email protected]

Then you have access to /lustre/atlas and /gpfs/alpine in order to use a script to transfer your files.