titan

Up since 11/8/17 02:45 pm

eos

Up since 11/14/17 11:20 pm

rhea

Up since 10/17/17 05:40 pm

hpss

Up since 11/20/17 09:15 am

atlas1

Up since 11/15/17 07:25 am

atlas2

Up since 11/27/17 10:45 am
OLCF User Assistance Center

Can't find the information you need below? Need advice from a real person? We're here to help.

OLCF support consultants are available to respond to your emails and phone calls from 9:00 a.m. to 5:00 p.m. EST, Monday through Friday, exclusive of holidays. Emails received outside of regular support hours will be addressed the next business day.

Compiling Threaded Codes on Eos

See this article in context within the following user guides: Eos

When building threaded codes on Cray machines, you may need to take additional steps to ensure a proper build.

OpenMP

For Intel, use the -openmp option:

$ cc -openmp test.c -o test.x
$ setenv OMP_NUM_THREADS 2
$ aprun -n2 -d2 ./test.x

For PGI, add -mp to the build line:

$ module swap PrgEnv-intel PrgEnv-pgi
$ cc -mp test.c -o test.x
$ setenv OMP_NUM_THREADS 2
$ aprun -n2 -d2 ./test.x

For GNU, add -fopenmp to the build line:

$ module swap PrgEnv-intel PrgEnv-gnu
$ cc -fopenmp test.c -o test.x
$ setenv OMP_NUM_THREADS 2
$ aprun -n2 -d2 ./test.x

For Cray, no additional flags are required:

$ module swap PrgEnv-intel PrgEnv-cray
$ cc test.c -o test.x
$ setenv OMP_NUM_THREADS 2
$ aprun -n2 -d2 ./test.x
Running with OpenMP and PrgEnv-intel

An extra thread created by the Intel OpenMP runtime interacts with the CLE thread binding mechanism and causes poor performance. To work around this issue, CPU-binding should be turned off. This is only an issue for OpenMP with the Intel programming environment.

How CPU-binding is shut off depends on how the job is placed on the node. In the following examples, we refer to the number of threads per MPI task as depth; this is controlled by the -d option to aprun. We refer to the number of MPI task or processing elements per socket as npes; this is controlled by the -n option to aprun. In the following examples replace depth with the value for number of threads per MPI task, and npes with the value for the number of MPI tasks (processing elements) per socket that you plan to use. 

For the case of running when depth divides evenly into the number of processing elements on a socket (npes),

export OMP_NUM_THREADS="<=depth" 
aprun -n npes -d "depth" -cc numa_node a.out

For the case of running when depth does not divide evenly into the number of processing elements on a socket (npes),

export OMP_NUM_THREADS="<=depth" 
aprun -n npes -d “depth” -cc none a.out

In the future, a new feature should provide an aprun option to interface more smoothly with OpenMP codes using the Intel programing environment. This documentation will be updated at that time.