titan

Up since 11/8/17 02:45 pm

eos

Up since 11/14/17 11:20 pm

rhea

Up since 10/17/17 05:40 pm

hpss

Up since 11/20/17 09:15 am

atlas1

Up since 11/15/17 07:25 am

atlas2

Up since 11/27/17 10:45 am
OLCF User Assistance Center

Can't find the information you need below? Need advice from a real person? We're here to help.

OLCF support consultants are available to respond to your emails and phone calls from 9:00 a.m. to 5:00 p.m. EST, Monday through Friday, exclusive of holidays. Emails received outside of regular support hours will be addressed the next business day.

Using the aprun command

See this article in context within the following user guides: Eos | Titan

The aprun command is used to run a compiled application program across one or more compute nodes. You use the aprun command to specify application resource requirements, request application placement, and initiate application launch.

The machine’s physical node layout plays an important role in how aprun works. Each Titan compute node contains (2) 8-core NUMA nodes on a single socket (a total of 16 cores).

Note: The aprun command is the only mechanism for running an executable in parallel on compute nodes. To run jobs as efficiently as possible, a thorough understanding of how to use aprun and its various options is paramount.

OLCF uses a version of aprun with two extensions. One is used to identify which libraries are used by an executable to allow us to better track third party software that is being actively used on the system. The other analyzes the command line to identify cases where users might be able to optimize their application’s performance by using slightly different job layout options. We highly recommend using both of these features; however, if there is a reason you wish to disable one or the other please contact the User Assistance Center for information on how to do that.

Shell Resource Limits

By default, aprun will not forward shell limits set by ulimit for sh/ksh/bash or by limit for csh/tcsh.

To pass these settings to your batch job, you should set the environment variable APRUN_XFER_LIMITS to 1 via export APRUN_XFER_LIMITS=1 for sh/ksh/bash or setenv APRUN_XFER_LIMITS 1 for csh/tcsh.

Simultaneous aprun Limit

All aprun processes are launched from a small number of shared service nodes. Because large numbers of aprun processes can cause other users’ apruns to fail, users are asked to limit the number of simultaneous apruns executed within a batch script. Users are limited to 50 aprun processes per batch job; attempts to launch apruns over the limit will result in the following error:

apsched: no more claims allowed for this reservation (max 50)
Warning: Users are limited to 50 aprun processes per batch job.

Single-aprun Process Ensembles with wraprun

In some situations, the simultaneous aprun limit can be overcome by using the utility wraprun. Wraprun has the capacity to run an arbitrary number and combination of qualified MPI or serial applications under a single aprun call.

Note: MPI executables launched under wraprun must dynamically linked. Non-MPI applications must be launched using a serial wrapper included with wraprun.
Warning: Tasks bundled with wraprun should each consume approximately the same walltime to avoid wasting allocation hours.

Complete information and examples on using wraprun can be found the wraprun documentation.

Common aprun Options

The following table lists commonly-used options to aprun. For a more detailed description of aprun options, see the aprun man page.

Option Description
-D Debug; shows the layout aprun will use
-n Number of total MPI tasks (aka ‘processing elements’) for the executable. If you do not specify the number of tasks to aprun, the system will default to 1.
-N Number of MPI tasks (aka ‘processing elements’) per physical node.

Warning: Because each node contains multiple processors/NUMA nodes, the -S option is likely a better option than -N to control layout within a node.
-m Memory required per MPI task. There is a maximum of 2GB per core, i.e. requesting 2.1GB will allocate two cores minimum per MPI task
-d Number of threads per MPI task.

Warning: The default value for -d is 1. If you specify OMP_NUM_THREADS but do not give a -d option, aprun will allocate your threads to a single core. Use OMP_NUM_THREADS to specify to your code the number of threads per MPI task; use -d to tell aprun how to place those threads.
-j
For Titan: Number of CPUs to use per paired-core compute unit. The -j parameter specifies the number of CPUs to be allocated per paired-core compute unit. The valid values for -j are 0 (use the system default), 1 (use one integer core), and 2 (use both integer cores; this is the system default).
For Eos: The -j parameter controls Hyper Threading. The valid values for -j are 0 (use the system default), 1 (turn Hyper Threading off), and 2 (turn Hyper Threading on; this is the system default).
-cc This is the cpu_list option. It binds MPI tasks or threads to the specified CPUs. The list is given as a set of comma-separated numbers (0 though 15) which each specify a compute unit (core) on the node. The list can also be given as hyphen-separated rages of numbers which each specify a range of compute units (cores) on the node. See man aprun.
-S Number of MPI tasks (aka ‘processing elements’) per NUMA node. Can be 1, 2, 3, 4, 5, 6, 7, or 8.
-ss Strict memory containment per NUMA node. The default is to allow remote NUMA node memory access. This option prevents memory access of the remote NUMA node.
-r Assign system services associated with your application to a compute core.

If you use less than 16 cores, you can request all of the system services to be placed on an unused core. This will reduce “jitter” (i.e. application variability) because the daemons will not cause the application to context switch unexpectedly.

Should use -r 1 ensuring -N is less than 16 or -S is less than 8.