titan

Up since 11/8/17 02:45 pm

eos

Up since 11/14/17 11:20 pm

rhea

Up since 10/17/17 05:40 pm

hpss

Up since 11/20/17 09:15 am

atlas1

Up since 11/15/17 07:25 am

atlas2

Up since 11/27/17 10:45 am
OLCF User Assistance Center

Can't find the information you need below? Need advice from a real person? We're here to help.

OLCF support consultants are available to respond to your emails and phone calls from 9:00 a.m. to 5:00 p.m. EST, Monday through Friday, exclusive of holidays. Emails received outside of regular support hours will be addressed the next business day.

Controlling MPI Task Layout Across Many Physical Nodes

See this article in context within the following user guides: Eos | Titan

Users have (2) ways to control MPI task layout:

  1. Within a physical node
  2. Across physical nodes

This article focuses on how to control MPI task layout across physical nodes nodes.

The default MPI task layout is SMP-style. This means MPI will sequentially allocate all virtual cores on one physical node before allocating tasks to another physical node.

Viewing Multi-Node Layout Order

Task layout can be seen by setting MPICH_RANK_REORDER_DISPLAY to 1.

Changing Multi-Node Layout Order

For multi-node jobs, layout order can be changed using the environment variable MPICH_RANK_REORDER_METHOD. See man intro_mpi for more information.

Multi-Node Layout Order Examples
Example 1: Default Layout

The following will run a.out across (32) cores. This requires (2) physical compute nodes.

# On Titan
$ aprun -n 32 ./a.out

# On Eos, Hyper-threading must be disabled:
$ aprun -n 32 -j1 ./a.out

Compute Node 0
NUMA 0 NUMA 1
Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 7 Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 7
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Compute Node 1
NUMA 0 NUMA 1
Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 7 Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 7
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Example 2: Round-Robin Layout

The following will place tasks in a round robin fashion. This requires (2) physical compute nodes.

$ setenv MPICH_RANK_REORDER_METHOD 0
# On Titan
$ aprun -n 32 ./a.out

# On Eos, Hyper-threading must be disabled:
$ aprun -n 32 -j1 ./a.out

Compute Node 0
NUMA 0 NUMA 1
Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 7 Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 7
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Compute Node 1
NUMA 0 NUMA 1
Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 7 Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 7
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31

Example 3: Combining Inter-Node and Intra-Node Options

The following combines MPICH_RANK_REORDER_METHOD and -S to place tasks on three cores per processor within a node and in a round robin fashion across nodes.

$ setenv MPICH_RANK_REORDER_METHOD 0
$ aprun -n12 -S3 ./a.out

Compute Node 0
NUMA 0 NUMA 1
Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 7 Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 7
0 2 4 6 8 10
Compute Node 1
NUMA 0 NUMA 1
Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 7 Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 7
1 3 5 7 9 11