Running jobs on Crest

Compiling on Crest

Several compilers are available on Crest, including:

XL: the IBM XL Compilers
IBM LLVM: the LLVM compiler infrastructure.
PGI: the Portland Group compiler suite.
GCC: the GNU Compiler Collection.
PathScale ENZO: the PathScale ENZO compiler suite.

Note: Please note that some of the compiler versions are alpha or beta releases, and are still in active development. Correctness and performance results obtained from using these compilers should not be shared without explicit consent from the corresponding vendors.

Compiling with IBM XL

The IBM XL compiler suite provides the XL C/C++ compiler suite (xlC), and the XL Fortran compiler (xlf). The following versions of the XL compilers are currently the default on Crest:

xlf/15.1.4
xlC/13.1.4

The following commands should be used to invoke the XL compilers:

Compile C source files: xlc, xlc_r
Compile C++ source files: xlC, xlC_r
Compile Fortran source files:
- FORTRAN 77 files: xlf, xlf_r
- Fortran 90 files: xlf90, xlf90_r
- Fortran 95 files: xlf95, xlf95_r
- Fortran 2003 files: xlf2003, xlf2003_r
- Fortran 2008 files: xlf2008, xlf2008_r

The commands with the _r suffix are thread-safe versions of the compiler and should be used to create threaded applications.

To compile a serial code, users can rely on the commands described above. For example, a serial application (C or Fortran 90) would be compiled as follows:

xlc my_program.c -o serial_prog
xlf90 my_program.f90 -o serial_prog

To compile a parallel application, users would instead need to use the mpcc, mpCC, and mpfort compiler wrappers. An MPI program (C or Fortran 90) for example, would be compiled with:

mpcc my_mpi_program.c -o par_prog
mpfort my_mpi_program.f90 -o par_prog

Compiling with IBM LLVM

The C/C++ front-end of LLVM is provided via Clang. The following commands should be used to invoke the IBM LLVM compilers:

Compile C source files: clang
Compile C++ source files: clang++

Compiling with PGI

An alpha release of the PGI compiler suite is available on Crest through the pgi modules. The PGI compiler provides C/C++ and Fortran interfaces. To invoke the PGI compilers you can use the following commands:

Compile C source files: pgcc
Compile C++ source files: pgc++
Compile Fortran source files: pgfortran

Compiling with GCC

The GNU Compiler Collection (GCC) is available on Crest and it is installed at the system level. Users can directly access it without the need to load a module. The default version of GCC currently installed is GCC 4.9.3. GCC 5.3.1 is also installed and can be accessed via the gcc-5 command.

The following commands can be used to invoke the GCC compilers:

Compile C source files: gcc, gcc-5
Compile C++ source files: g++, g++-5
Compile Fortran source files: gfortran, gfortran-5

In addition, a pre-release version of GCC 6.0.0 is now available on Crest, and can be accessed by loading the gcc/6.0.0-20160128 module.

Compiling with PathScale ENZO

The PathScale ENZO compiler suite is available on Crest via the pathscale module. To invoke the PathScale compiler, you can use:

Compile C source files: pathcc
Compile C++ source files: pathCC
Compile Fortran 90 source files: pathf90
Compile Fortran 95 source files: pathf95

For more details on how to use the PathScale ENZO compiler, please see the PathScale ENZO User Guide.

Compiling OpenMP applications on Crest

All compilers on Crest provide support for OpenMP. The compiler flags needed to build OpenMP applications are listed in the table below.

Compiler	Compiler Flags	Additional Flags/Notes
IBM XL	`-qsmp=omp`	Offload OpenMP 4 directives to the GPU: `-qoffload`
IBM LLVM	`-fopenmp=libomp -omptargets=nvptx64sm_35-nvidia-linux`	Include Path: `–I<full-path-prefix>/clang-coral/omprtl/`
PGI	`-mp`
GCC	`-fopenmp`	The `-fopenmp` flag is also required for the linker.
PathScale	`-mp -device=kepler`

Compiling OpenACC codes on Crest

On Crest, OpenACC support is available when using the PGI, GCC, or PathScale compilers. The table below includes the flags needed to compile OpenACC applications as well as additional flags available for specific compilers.

Compiler	Compiler Flags	Additional Flags
PGI	`-acc`	With CUDA managed memory: `-acc -ta=tesla:managed`On multicore with PGI 16.4+: `-acc -ta=multicore`
GCC	`-fopenacc`
PathScale	`-acc -device=kepler`

Optimizing applications on Crest

The different compilers on Crest provide a diverse range of features. The following tables include information on the different optimizations available for each compiler.

Optimizations for IBM XL

Optimization Levels
Disable all optimizations (default)	`-qstrict`
Local optimizations
Global optimizations	`-O2`
Additional aggressive optimizations	`-O3`
Maximize performance (default with -O3)	`-qmaxmem=-1`
Additional Optimizations
Record Optimizations	`-qsaveopt`
High Order Transformations	`-qhot`
Floating point accuracy
Enable generation of code that follows IEEE arithmetic	`-qstrict=ieeefp`

Optimizations for IBM LLVM

Architecture
Generate instructions specific to POWER	-target powerpc64le-ibm-linux-gnu -mc-target powerpc64le-ibm-linux-gnu -mcpu=pwr8pu=pwr8
Optimization Levels
Disable all optimizations (default)	`-O0`
Local optimizations	`-O1`
Global optimizations	`-O2`
Additional aggressive optimizations	`-O3`
Additional Optimizations
Vectorization	Enabled by default (`-fno-vectorize` to disable)
Vector width	`-force-vector-width=<n>`
Unrolling	`-force-vector-unroll=<n>`

Optimizations for PGI

Optimization Levels
Disable all optimizations	`-O0`
Local optimization	`-O1`
Global optimization (enables vectorization)	`-O2`
Aggressive global optimization	`-O3`
Hoist guarded invariant floating point expressions	`-O4`
Maximize performance	`-fast`
Additional Optimizations
Huge pages	`-Msmartalloc=huge`
Autoparallelize loops	`-Mconcur`
Enable vectorization	`-Mvect`
Interprocedural Optimization	`-Mipa=fast,inline`
CUDA Fortran	`-Mcuda`
Prefetch instructions	`-Mvect=prefetch`
Profile guided optimization	`-Mpfi -Mpfo`
Unroll loops	`-Munroll`
Floating point accuracy
Generate relaxed precision code	`-Mfprelaxed`
Perform floating point operations in conformance with IEEE standard	`-Kieee`

Optimizations for GCC

Optimization Levels
Disable all optimizations (default)	`-O0`
Local optimizations	`-O1`
Global optimizations	`-O2`
Additional aggressive optimizations	`-O3`
Maximize performance	`-Ofast`
Additional Optimizations
Enable unrolling	`-funroll-all-loops`
Generate prefetch instructions for loops	`-fprefetch-loop-arrays --param prefetch-latency=300 (300-700)`
Inline string operations	`-minline-all-stringops`
Profile guided optimization	`-fprofile-generate -fprofile-use`
Turn off partial redundancy elimination	`-fno-tree-pre`
Vectorization	`-ftree-vectorize`
Floating point accuracy
Enable generation of code that follows IEEE arithmetic	`-mieee-fp`
Enable faster, less precise math operations	`-ffast-math`

Optimizations for PathScale ENZO

Optimization Levels
Disable all optimizations	`-O0`
Local optimizations	`-O1`
Global optimizations (default)	`-O2`
Additional aggressive optimizations	`-O3`
Maximize performance	`-Ofast`
Additional Optimizations
Autoparallelization	`-apo`
Feedback directed optimization	`-fb-create -fb-opt`
Interprocedural Analysis and Optimizations	`-ipa`
Loop nest optimizations, vectorization, prefetch, fission, fusion	`-LNO:fission=<n> -LNO:fusion=<n>`
AMP++	`pathamp -device=kepler`
Prefetch (disabled by default)	`-LNO:prefetch -LNO:prefetch_ahead`
Floating point accuracy
Floating point accuracy	`-fp-accuracy`

Additional Resources

The following resources provide more information on the IBM Power architecture as well as performance optimization and tuning techniques for IBM Power systems:

Job Execution on Crest

Crest uses the IBM LSF scheduler to manage and execute jobs. For more information about the LSF scheduler, please review the Crest user information page.

Once resources have been allocated for a job, users can run serial applications directly on the command line, or launch a parallel application via the poe command.

Running in an interactive job

When an interactive job starts, users are given a shell on the first compute node of their corresponding allocation. For this reason, serial jobs can be started simply by launching the executable on the command line:

crest-login1 $ bsub -Is $SHELL
Job <11138> is submitted to default queue <interactive>.
<<Waiting for dispatch ...>>
<<Starting on crest1.ccs.ornl.gov>>crest1 $ ./serial_prog

For parallel jobs, users must request the number of tasks during the job submission step. The poe command will use this information to launch the job in parallel across all allocated tasks. In the example below, 24 tasks are requested for 30 minutes inside an interactive job:

crest-login1 $ bsub -W 30 -n 24 -Is $SHELL Job <11141> is submitted to default queue <interactive>. <<Waiting for dispatch ...>> <<Starting on crest2.ccs.ornl.gov>>crest1 $ poe ./par_prog ATTENTION: 0031-393 Ignoring -resd/MP_RESD specified for batch job ATTENTION: 0031-408 24 tasks allocated by Resource Manager, continuing... Rank: 20 NID: crest4 Total: 24 Rank: 21 NID: crest4 Total: 24 Rank: 22 NID: crest4 Total: 24 Rank: 2 NID: crest1 Total: 24 Rank: 23 NID: crest4 Total: 24 Rank: 3 NID: crest1 Total: 24 Rank: 8 NID: crest1 Total: 24 Rank: 9 NID: crest1 Total: 24 Rank: 15 NID: crest1 Total: 24 Rank: 16 NID: crest1 Total: 24 Rank: 17 NID: crest1 Total: 24 Rank: 0 NID: crest1 Total: 24 Rank: 1 NID: crest1 Total: 24 Rank: 4 NID: crest1 Total: 24 Rank: 5 NID: crest1 Total: 24 Rank: 6 NID: crest1 Total: 24 Rank: 7 NID: crest1 Total: 24 Rank: 10 NID: crest1 Total: 24 Rank: 11 NID: crest1 Total: 24 Rank: 12 NID: crest1 Total: 24 Rank: 13 NID: crest1 Total: 24 Rank: 14 NID: crest1 Total: 24 Rank: 18 NID: crest1 Total: 24 Rank: 19 NID: crest1 Total: 24