titan

Up since 11/8/17 02:45 pm

eos

Up since 11/14/17 11:20 pm

rhea

Up since 10/17/17 05:40 pm

hpss

Up since 11/20/17 09:15 am

atlas1

Up since 11/15/17 07:25 am

atlas2

Up since 11/27/17 10:45 am
OLCF User Assistance Center

Can't find the information you need below? Need advice from a real person? We're here to help.

OLCF support consultants are available to respond to your emails and phone calls from 9:00 a.m. to 5:00 p.m. EST, Monday through Friday, exclusive of holidays. Emails received outside of regular support hours will be addressed the next business day.

Titan Batch Script Examples

See this article in context within the following user guides: Titan

This page lists several example batch scripts that can be used to run various types of jobs on XK compute resources.

Warning: Compute nodes can see only the Lustre-backed storage areas; binaries must be executed from within User Work (i.e., $MEMBERWORK/{projectid} or Project Work (i.e., $PROJWORK/{projectid} areas. All data needed by a binary must also exist in these Lustre-backed areas. More information can be found on the Filesystems Available to Compute Nodes page.
Launching an MPI-only job

Suppose we want to launch a job on 300 Titan nodes using all 16 available CPU cores. The following example will request (300) nodes for 1 hour and 30 minutes. It will then launch 4,800 (300 x 16) MPI ranks on the allocated cores (one task per core).

  #!/bin/bash
  # File name: my-job-name.pbs
  #PBS -A PRJ123
  #PBS -l walltime=1:30:00
  #PBS -l nodes=300
  #PBS -l gres=atlas1%atlas2

  cd $MEMBERWORK/prj123
   
  aprun -n 4800 my-simulation.exe

The first line (#PBS -A PRJ123) PBS script will tell the system scheduler that you’d like to launch this job against the PRJ123 allocation. For example, if you are on the project SCI404, your PBS scripts would need to say #PBS -A SCI404 instead. If you are on multiple projects, do be careful to double check that your jobs launch with the intended allocation.

To invoke the above script from the command line, simply:

  $ qsub my-job-name.pbs
    123456.nid00004

You can check the status of job number 123456 by running:

  $ showq | grep 123456
    123456   userid    Running   4800   00:00:44   Sat Oct 15 06:18:56
Naming Jobs

Users who submit many jobs to the queue at once may want to consider naming their jobs in order to keep track of which ones are running and which are still being held in the queue. This can be done with the #PBS -N my-job-name option. For example, to name your job P3HT-PCBM-simulation-1:

  #!/bin/bash
  # File name: simulation1.pbs
  #PBS -A PRJ123
  #PBS -N P3HT-PCBM-simulation-1
  #PBS -l walltime=1:30:00
  #PBS -l nodes=300
  #PBS -l gres=atlas1%atlas2

  cd $MEMBERWORK/prj123
   
  aprun -n 4800 my-simulation.exe
Controlling Output

By default, when your jobs print data to STDOUT or STDERR, it gets aggregated into two files: job-name.o123456 and job-name.e123456 (where 123456 is your job id). These files are written into the directory from which you submitted your job with the qsub command.

If you wish to aggregate this output into a single file (which may help you understand where errors occur), you can join these two output streams by using the #PBS -j oe option. For example,

  #!/bin/bash
  # File name: simulation1.pbs
  #PBS -A PRJ123
  #PBS -N P3HT-PCBM-simulation-1
  #PBS -j oe
  #PBS -l walltime=1:30:00
  #PBS -l nodes=300
  #PBS -l gres=atlas1%atlas2

  cd $MEMBERWORK/prj123
   
  aprun -n 4800 my-simulation.exe
Using Environment Modules

By default, the module environment tool is not available in batch scripts. If you need to load modules before launching your jobs (to adjust your $PATH or to make shared libraries available), you will first need to import the module utility into your batch script with the following command:

  source $MODULESHOME/init/<myshell>

where <myshell> is the name of your default shell.

As an example, let’s load the ddt module before launching the following simulation (assuming we are using the bash shell):

  #!/bin/bash
  # File name: simulation.pbs
  #PBS -A PRJ123
  #PBS -N P3HT-PCBM-simulation
  #PBS -j oe
  #PBS -l walltime=1:30:00
  #PBS -l nodes=300
  #PBS -l gres=atlas1%atlas2

  source $MODULESHOME/init/bash
  module load ddt
  cd $MEMBERWORK/prj123
   
  aprun -n 4800 my-simulation.exe

If you are loading a specific programming enviroment, it is advisable to load your programming environment first before loading other modules. Some modules have different behavior for each programming environment, and may not load correctly if the programming environment is not specified first.

Basic MPI on Partial Nodes

A node’s cores cannot be shared by multiple batch jobs or aprun jobs; therefore, a job must be allocated all cores on a node. But, users do not have to utilize all of the cores allocated to their batch job. Through aprun options, users have the ability to run on all or only some of a node’s cores and they have some control over which cores are being used.

Reasons for utilizing only a portion of a node’s cores can be: increasing memory available to each task, utilizing one floating point unit per compute unit, and increasing memory bandwidth available to each task.

Each node contains (2) NUMA nodes. Users can control CPU task layout using the aprun NUMA node flags. For jobs that do not utilize all cores on a node, it may be beneficial to spread a node’s task load over the (2) NUMA nodes using aprun -S. The -j can also be used to utilize only integer core on each compute unit.

The following example will request 4,000 nodes for (8) hours. It will then run a 24,000 task MPI job using (6) of each allocated node’s (16) cores.

  #!/bin/bash
  #PBS -A PRJ123
  #PBS -N mpi-partial-node
  #PBS -j oe
  #PBS -l walltime=8:00:00,nodes=4000
  #PBS -l gres=atlas1%atlas2

  cd $MEMBERWORK/prj123
   
  aprun -n 24000 -S 3 -j1 a.out
  $ qsub mpi-partial-node-ex.pbs
    234567.nid00004
  $ showq | grep 234567
    234567   userid   Running   64000   00:00:44   Mon Oct 09 03:11:23

Please note that per Titan’s scheduling policy, the job will be charged for all 4,000 nodes.

Hybrid OpenMP/MPI

The following example batch script will request (3) nodes for (1) hour. It will then run a hybrid OpenMP/MPI job using (3) MPI tasks each running (16) threads.

  #!/bin/bash
  #PBS -A PRJ123
  #PBS -N hybrid-test
  #PBS -j oe
  #PBS -l walltime=1:00:00,nodes=3
  #PBS -l gres=atlas1%atlas2

  cd $PROJWORK/prj123
   
  setenv OMP_NUM_THREADS 16

  aprun -n 3 -N 1 -d 16 mpi-openmp-ex.x
  $ cc -mp mpi-openmp-ex.c -o mpi-openmp-ex.x
  $ qsub mpi-openmp-ex.pbs
    345678.nid00004
  $  showq | grep 345678
    345678   userid   Running   48   00:00:44   Mon Aug 19 21:49:18
Thread Performance Considerations

On Titan, each pair of CPU cores shares a single Floating Point Unit (FPU). This means that arithmetic-laden threads on neighboring CPU cores may contend for the shared FPU, which could lead to performance degradation.

To help avoid this issue, aprun can force only 1 thread to be associated with each core pair by using the -j 1 option. Here’s how we could revise the previous example to allocate only 1 thread per FPU:

  #!/bin/bash
  #PBS -A PRJ123
  #PBS -N hybrid-test
  #PBS -j oe
  #PBS -l walltime=1:00:00,nodes=3
  #PBS -l gres=atlas1%atlas2

  cd $PROJWORK/prj123
   
  setenv OMP_NUM_THREADS 8

  aprun -n 3 -N 1 -d 8 -j 1 mpi-openmp-ex.x
Launching Several Executables at Once
Warning: Because large numbers of aprun processes can cause other users’ apruns to fail, users are asked to limit the number of simultaneous apruns executed within a batch script.

Users are limited to 50 aprun processes per batch job.

The following example will request 6,000 nodes for (12) hours. It will then run (4) MPI jobs each simultaneously running on 24,000 cores. The OS will spread each aprun job out such that the jobs do not share nodes.

  #!/bin/bash
  #PBS -A PROJ123
  #PBS -N multi-job
  #PBS -j oe
  #PBS -l walltime=12:00:00,nodes=6000
  #PBS -l gres=atlas1%atlas2

  cd $MEMBERWORK/prj123

  aprun -n 24000 a.out1 &
  aprun -n 24000 a.out2 &
  aprun -n 24000 a.out3 &
  aprun -n 24000 a.out4 &

  wait
  $ qsub multi-job-ex.pbs
    456789.nid00004
  $ showq | grep 456789
    456789   userid    Running   96000   00:00:44   Thu Oct 07 11:32:52 
Important Considerations for Simultaneous Jobs in a Single Script
  • The aprun instances must be backgrounded
    The & symbols in the exmaple above will place each aprun in the background allowing the OS to place and run each simultaneously. Without placing the apruns in the background, the OS will run them serially waiting until one completes before starting the next.
  • The batch script must wait for backgrounded processes
    The wait command will prevent the batch script from returning until each background-ed aprun completes. Without the wait the script will return once each aprun has been started, causing the batch job to end, which kills each of the background-ed aprun processes.
  • The aprun instances cannot share nodes
    The system will only run one aprun per node; the system will not run multiple aprun instances on the same node at the same time. For exmaple, users cannot run (2) 8-core aprun jobs on the same node. In order to run (2) 8-core aprun instances at the same time, (2) nodes must be allocated.
Chaining Batch Jobs

The following example will

  1. Submit 1.pbs which will be immediately eligible for execution
  2. Submit 2.pbs which will be placed in a held state, not eligible for execution until 1.pbs completes without errors
$ qsub 1.pbs
123451
$ qsub -W depend=afterok:123451 2.pbs
123452

You can then use the showq and checkjob utilities to view job states:

$ showq -u userid
...
123451              userid    Running    16
...
123452              userid       Hold    16
...
$ checkjob 123452
...
NOTE:  job cannot run  (dependency 123451 jobsuccessfulcomplete not met)