titan

Down since 5/27/17 10:20 pm

eos

Down since 5/27/17 11:15 pm

rhea

Down since 5/27/17 10:40 pm

hpss

Up since 4/23/17 12:00 am

atlas1

Degraded since 5/27/17 10:30 pm

atlas2

Degraded since 5/27/17 10:30 pm
OLCF User Assistance Center

Can't find the information you need below? Need advice from a real person? We're here to help.

OLCF support consultants are available to respond to your emails and phone calls from 9:00 a.m. to 5:00 p.m. EST, Monday through Friday, exclusive of holidays. Emails received outside of regular support hours will be addressed the next business day.

Employing Data Transfer Nodes

Bookmark and Share
See this article in context within the following user guides: Data | Titan

The OLCF provides nodes dedicated to data transfer that are available via dtn.ccs.ornl.gov. These nodes have been tuned specifically for wide-area data transfers, and also perform well on local-area transfers. The OLCF recommends that users employ these nodes for data transfers, since in most cases transfer speed improves and load on computational systems’ login and service nodes is reduced.

Filesystems Accessible from DTNs

All OLCF filesystems — the NFS-backed User Home and Project Home areas, the Lustre®-backed User Work and Project Work areas, and the HPSS-backed User Archive and Project Archive areas — are accessible to users via the DTNs. For more information on available filesystems at the OLCF see the Data Management Overview page.

Interactive DTN Access

Members of allocated projects are automatically given access to the data transfer nodes. The interactive nodes are accessible for direct login through the dtn.css.ornl.gov alias.

Batch DTN Access

Batch data transfer nodes can be accessed through the Torque/MOAB queuing system on the dtn.ccs.ornl.gov interactive node. The DTN batch nodes are also accessible from the Titan, Eos, and Rhea batch systems through remote job submission.
This is accomplished by the command qsub -q host script.pbs which will submit the file script.pbs to the batch queue on the specified host. This command can be inserted at the end of an existing batch script in order to automatically trigger work on another OLCF resource.

Note: DTNs can help you manage your allocation hours efficiently by preventing billable compute resources from sitting idle.

The following scripts show how this technique could be employed. Note that only the first script, retrieve.pbs, would need to be manually submitted; the others will trigger automatically from within the respective batch scripts.

Example Workflow Using DTNs in Batch Mode

The first batch script, retrieve.pbs, retrieves data needed by a compute job. Once the data has been retrieved, the script submits a different batch script, compute.pbs, to run computations on the retrieved data.

To run this script start on Titan or Rhea.

qsub -q dtn retrieve.pbs
$ cat retrieve.pbs

  # Batch script to retrieve data from HPSS via DTNs

  # PBS directives
  #PBS -A PROJ123
  #PBS -l walltime=8:00:00

  # Retrieve required data
  cd $MEMBERWORK/proj123 
  hsi get largedatfileA
  hsi get largedatafileB

  # Verification code could go here

  # Submit other batch script to execute calculations on retrieved data
  qsub -q titan compute.pbs

$

The second batch script is submitted from the first to carry out computational work on the data. When the computational work is finished, the batch script backup.pbs is submitted to archive the resulting data.

$ cat compute.pbs

  # Batch script to carry out computation on retrieved data

  # PBS directives
  #PBS -l walltime=24:00:00 
  #PBS -l nodes=10000
  #PBS -A PROJ123
  #PBS -l gres=atlas1 # or atlas2

  # Launch executable
  cd $MEMBERWORK/proj123 
  aprun -n 160000 ./a.out

  # Submit other batch script to transfer resulting data to HPSS
  qsub -q dtn backup.pbs

$

The final batch script is submitted from the second to archive the resulting data soon after creation to HPSS via the hsi utility.

$ cat backup.pbs

  # Batch script to back-up resulting data

  # PBS directives
  #PBS -A PROJ123
  #PBS -l walltime=8:00:00

  # Store resulting data 
  cd $MEMBERWORK/proj123 
  hsi put largedatfileC
  hsi put largedatafileD

$

Some items to note:

  • Batch jobs submitted to the dtn partition will be executed on a DTN that is accessible exclusively via batch submissions. These batch-accessible DTNs have identical configurations to those DTNs that are accessible interactively; the only difference between the two is accessibility.
  • The DTNs are not currently a billable resource, i.e., the project specified in a batch job targeting the dtn partition will not be charged for time spent executing in the dtn partition.

Scheduled DTN Queue

  • The walltime limit for jobs submitted to the dtn partition is 24 hours.
  • Users may request a maximum of 4 nodes per batch job.
  • There is a limit of (2) eligible-to-run jobs per user.
  • Jobs in excess of the per user limit above will be placed into a held state, but will change to eligible-to-run when appropriate.
  • The queue allows each user a maximum of 6 running jobs.