Compiling on Crest
Several compilers are available on Crest, including:
- XL: the IBM XL Compilers
- IBM LLVM: the LLVM compiler infrastructure.
- PGI: the Portland Group compiler suite.
- GCC: the GNU Compiler Collection.
- PathScale ENZO: the PathScale ENZO compiler suite.
Compiling with IBM XL
The IBM XL compiler suite provides the XL C/C++ compiler suite (xlC
), and the XL Fortran compiler (xlf
). The following versions of the XL compilers are currently the default on Crest:
xlf/15.1.4 xlC/13.1.4
The following commands should be used to invoke the XL compilers:
- Compile C source files:
xlc
,xlc_r
- Compile C++ source files:
xlC
,xlC_r
- Compile Fortran source files:
- FORTRAN 77 files:
xlf
,xlf_r
- Fortran 90 files:
xlf90
,xlf90_r
- Fortran 95 files:
xlf95
,xlf95_r
- Fortran 2003 files:
xlf2003
,xlf2003_r
- Fortran 2008 files:
xlf2008
,xlf2008_r
- FORTRAN 77 files:
The commands with the _r
suffix are thread-safe versions of the compiler and should be used to create threaded applications.
To compile a serial code, users can rely on the commands described above. For example, a serial application (C or Fortran 90) would be compiled as follows:
xlc my_program.c -o serial_prog xlf90 my_program.f90 -o serial_prog
To compile a parallel application, users would instead need to use the mpcc
, mpCC
, and mpfort
compiler wrappers. An MPI program (C or Fortran 90) for example, would be compiled with:
mpcc my_mpi_program.c -o par_prog mpfort my_mpi_program.f90 -o par_prog
Compiling with IBM LLVM
The C/C++ front-end of LLVM is provided via Clang. The following commands should be used to invoke the IBM LLVM compilers:
- Compile C source files:
clang
- Compile C++ source files:
clang++
Compiling with PGI
An alpha release of the PGI compiler suite is available on Crest through the pgi modules. The PGI compiler provides C/C++ and Fortran interfaces. To invoke the PGI compilers you can use the following commands:
- Compile C source files:
pgcc
- Compile C++ source files:
pgc++
- Compile Fortran source files:
pgfortran
Compiling with GCC
The GNU Compiler Collection (GCC) is available on Crest and it is installed at the system level. Users can directly access it without the need to load a module. The default version of GCC currently installed is GCC 4.9.3
. GCC 5.3.1 is also installed and can be accessed via the gcc-5
command.
The following commands can be used to invoke the GCC compilers:
- Compile C source files:
gcc
,gcc-5
- Compile C++ source files:
g++
,g++-5
- Compile Fortran source files:
gfortran
,gfortran-5
In addition, a pre-release version of GCC 6.0.0
is now available on Crest, and can be accessed by loading the gcc/6.0.0-20160128
module.
Compiling with PathScale ENZO
The PathScale ENZO compiler suite is available on Crest via the pathscale
module. To invoke the PathScale compiler, you can use:
- Compile C source files:
pathcc
- Compile C++ source files:
pathCC
- Compile Fortran 90 source files:
pathf90
- Compile Fortran 95 source files:
pathf95
For more details on how to use the PathScale ENZO compiler, please see the PathScale ENZO User Guide.
Compiling OpenMP applications on Crest
All compilers on Crest provide support for OpenMP. The compiler flags needed to build OpenMP applications are listed in the table below.
Compiler | Compiler Flags | Additional Flags/Notes |
---|---|---|
IBM XL | -qsmp=omp |
Offload OpenMP 4 directives to the GPU: -qoffload |
IBM LLVM | -fopenmp=libomp |
Include Path: –I<full-path-prefix>/clang-coral/omprtl/ |
PGI | -mp |
|
GCC | -fopenmp |
The -fopenmp flag is also required for the linker. |
PathScale | -mp -device=kepler |
Compiling OpenACC codes on Crest
On Crest, OpenACC support is available when using the PGI, GCC, or PathScale compilers. The table below includes the flags needed to compile OpenACC applications as well as additional flags available for specific compilers.
Compiler | Compiler Flags | Additional Flags |
---|---|---|
PGI | -acc |
With CUDA managed memory:-acc -ta=tesla:managed On multicore with PGI 16.4+:-acc -ta=multicore |
GCC | -fopenacc |
|
PathScale | -acc -device=kepler |
Optimizing applications on Crest
The different compilers on Crest provide a diverse range of features. The following tables include information on the different optimizations available for each compiler.
Optimizations for IBM XL
Optimization Levels | |
---|---|
Disable all optimizations (default) | -qstrict |
Local optimizations | |
Global optimizations | -O2 |
Additional aggressive optimizations | -O3 |
Maximize performance (default with -O3) | -qmaxmem=-1 |
Additional Optimizations | |
Record Optimizations | -qsaveopt |
High Order Transformations | -qhot |
Floating point accuracy | |
Enable generation of code that follows IEEE arithmetic | -qstrict=ieeefp |
Optimizations for IBM LLVM
Architecture | |
---|---|
Generate instructions specific to POWER | -target powerpc64le-ibm-linux-gnu -mc-target powerpc64le-ibm-linux-gnu -mcpu=pwr8pu=pwr8 |
Optimization Levels | |
Disable all optimizations (default) | -O0 |
Local optimizations | -O1 |
Global optimizations | -O2 |
Additional aggressive optimizations | -O3 |
Additional Optimizations | |
Vectorization | Enabled by default (-fno-vectorize to disable) |
Vector width | -force-vector-width=<n> |
Unrolling | -force-vector-unroll=<n> |
Optimizations for PGI
Optimization Levels | |
---|---|
Disable all optimizations | -O0 |
Local optimization | -O1 |
Global optimization (enables vectorization) | -O2 |
Aggressive global optimization | -O3 |
Hoist guarded invariant floating point expressions | -O4 |
Maximize performance | -fast |
Additional Optimizations | |
Huge pages | -Msmartalloc=huge |
Autoparallelize loops | -Mconcur |
Enable vectorization | -Mvect |
Interprocedural Optimization | -Mipa=fast,inline |
CUDA Fortran | -Mcuda |
Prefetch instructions | -Mvect=prefetch |
Profile guided optimization | -Mpfi -Mpfo |
Unroll loops | -Munroll |
Floating point accuracy | |
Generate relaxed precision code | -Mfprelaxed |
Perform floating point operations in conformance with IEEE standard | -Kieee |
Optimizations for GCC
Optimization Levels | |
---|---|
Disable all optimizations (default) | -O0 |
Local optimizations | -O1 |
Global optimizations | -O2 |
Additional aggressive optimizations | -O3 |
Maximize performance | -Ofast |
Additional Optimizations | |
Enable unrolling | -funroll-all-loops |
Generate prefetch instructions for loops | -fprefetch-loop-arrays |
Inline string operations | -minline-all-stringops |
Profile guided optimization | -fprofile-generate -fprofile-use |
Turn off partial redundancy elimination | -fno-tree-pre |
Vectorization | -ftree-vectorize |
Floating point accuracy | |
Enable generation of code that follows IEEE arithmetic | -mieee-fp |
Enable faster, less precise math operations | -ffast-math |
Optimizations for PathScale ENZO
Optimization Levels | |
---|---|
Disable all optimizations | -O0 |
Local optimizations | -O1 |
Global optimizations (default) | -O2 |
Additional aggressive optimizations | -O3 |
Maximize performance | -Ofast |
Additional Optimizations | |
Autoparallelization | -apo |
Feedback directed optimization | -fb-create -fb-opt |
Interprocedural Analysis and Optimizations | -ipa |
Loop nest optimizations, vectorization, prefetch, fission, fusion | -LNO:fission=<n> -LNO:fusion=<n> |
AMP++ | pathamp -device=kepler |
Prefetch (disabled by default) | -LNO:prefetch -LNO:prefetch_ahead |
Floating point accuracy | |
Floating point accuracy | -fp-accuracy |
Additional Resources
The following resources provide more information on the IBM Power architecture as well as performance optimization and tuning techniques for IBM Power systems:
- IBM Power Systems S812L and S822L Technical Overview and Introduction
- Performance Optimization and Tuning Techniques for IBM Power Systems Processors Including IBM POWER8
Job Execution on Crest
Crest uses the IBM LSF scheduler to manage and execute jobs. For more information about the LSF scheduler, please review the Crest user information page.
Once resources have been allocated for a job, users can run serial applications directly on the command line, or launch a parallel application via the poe
command.
Running in an interactive job
When an interactive job starts, users are given a shell on the first compute node of their corresponding allocation. For this reason, serial jobs can be started simply by launching the executable on the command line:
crest-login1 $ bsub -Is $SHELL Job <11138> is submitted to default queue <interactive>. <<Waiting for dispatch ...>> <<Starting on crest1.ccs.ornl.gov>>crest1 $ ./serial_prog
poe
command will use this information to launch the job in parallel across all allocated tasks. In the example below, 24 tasks are requested for 30 minutes inside an interactive job: