titan

Up since 11/8/17 02:45 pm

eos

Up since 11/14/17 11:20 pm

rhea

Up since 10/17/17 05:40 pm

hpss

Up since 11/20/17 09:15 am

atlas1

Up since 11/15/17 07:25 am

atlas2

Up since 11/27/17 10:45 am
OLCF User Assistance Center

Can't find the information you need below? Need advice from a real person? We're here to help.

OLCF support consultants are available to respond to your emails and phone calls from 9:00 a.m. to 5:00 p.m. EST, Monday through Friday, exclusive of holidays. Emails received outside of regular support hours will be addressed the next business day.

Parallel Job Execution on Commodity Clusters

See this article in context within the following user guides: Rhea
Using mpirun

By default, commands will executed on the job’s primary compute node, sometimes referred to as the job’s head node. The mpirun command is used to execute an MPI executable on one or more compute nodes in parallel.

mpirun accepts the following common options:

--npernode Number of ranks per node
-n Total number of MPI ranks
--bind-to none Allow code to control thread affinity
--map-by ppr:N:node:pe=T Place N tasks per node leaving space for T threads
--map-by ppr:N:socket:pe=T Place N tasks per socket leaving space for T threads
--map-by ppr:N:socket Assign tasks by socket placing N tasks on each socket
--report-bindings Have MPI explain which ranks have been assigned to which nodes / physical cores
Note: If you do not specify the number of MPI tasks to mpirun via -n, the system will default to all available cores allocated to the job.
MPI Task Layout

Each compute node on Rhea contains two sockets each with 8 cores. Depending on your job, it may be useful to control task layout within and across nodes.

Default Layout: Sequential

The following will run a copy of a.out on two cores each on the same node:

$ mpirun -np 2 ./a.out

Compute Node
Socket 0 Socket 1
Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 7 Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 7
0 1

4 cores, 2 cores per socket, 1 node

The following will run a.out on 4 cores, 2 cores per socket, 1 node:

$ mpirun -np 4 --map-by ppr:2:socket ./a.out

Compute Node
Socket 0 Socket 1
Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 7 Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 7
0 1 2 3

4 cores, 1 core per socket, 2 nodes

The following will run a.out on 4 cores, 1 core per socket, 2 nodes. This can be useful if you need to spread your batch job over multiple nodes to allow each task access to more memory.

$ mpirun -np 4 --map-by ppr:1:socket ./a.out

Compute Node 0
Socket 0 Socket 1
Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 7 Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 7
0 1
Compute Node 1
Socket 0 Socket 1
Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 7 Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 7
2 3

The --report-bindings flag can be used to report task layout:

$ mpirun -np 4 --map-by ppr:1:socket --report-bindings hostname
[rhea2:47176] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: [BB/../../../../../../..][../../../../../../../..]
[rhea2:47176] MCW rank 1 bound to socket 1[core 8[hwt 0-1]]: [../../../../../../../..][BB/../../../../../../..]
[rhea4:104150] MCW rank 2 bound to socket 0[core 0[hwt 0-1]]: [BB/../../../../../../..][../../../../../../../..]
[rhea4:104150] MCW rank 3 bound to socket 1[core 8[hwt 0-1]]: [../../../../../../../..][BB/../../../../../../..]
$
Thread Layout
Warning: Without controlling affinity, threads may be placed on the same core.
2 MPI tasks, 1 tasks per node, 16 threads per task, 2 nodes
$ setenv OMP_NUM_THREADS 16
$ mpirun -np 2 --map-by ppr:1:node:pe=16 ./a.out

Compute Node 0
Socket 0 Socket 1
Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 7 Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 7
Task 0, Thread 0 Thread 1 Thread 2 Thread 3 Thread 4 Thread 5 Thread 6 Thread 7 Thread 8 Thread 9 Thread 10 Thread 11 Thread 12 Thread 13 Thread 14 Thread 15
Compute Node 1
Socket 0 Socket 1
Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 7 Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 7
Task 1, Thread 0 Thread 1 Thread 2 Thread 3 Thread 4 Thread 5 Thread 6 Thread 7 Thread 8 Thread 9 Thread 10 Thread 11 Thread 12 Thread 13 Thread 14 Thread 15

2 MPI tasks, 1 tasks per socket, 4 threads per task, 1 node
$ setenv OMP_NUM_THREADS 4
$ mpirun -np 2 --map-by ppr:1:socket:pe=4 ./a.out

Compute Node
Socket 0 Socket 1
Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 7 Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 7
Task 0, Thread 0 Thread 1 Thread 2 Thread 3 Task 1, Thread 0 Thread 1 Thread 2 Thread 3