Skip to main content

Compiling on Crest

Several compilers are available on Crest, including:

  • XL: the IBM XL Compilers
  • IBM LLVM: the LLVM compiler infrastructure.
  • PGI: the Portland Group compiler suite.
  • GCC: the GNU Compiler Collection.
  • PathScale ENZO: the PathScale ENZO compiler suite.
Note: Please note that some of the compiler versions are alpha or beta releases, and are still in active development. Correctness and performance results obtained from using these compilers should not be shared without explicit consent from the corresponding vendors.

Compiling with IBM XL

The IBM XL compiler suite provides the XL C/C++ compiler suite (xlC), and the XL Fortran compiler (xlf). The following versions of the XL compilers are currently the default on Crest:

xlf/15.1.4
xlC/13.1.4

The following commands should be used to invoke the XL compilers:

  • Compile C source files: xlc, xlc_r
  • Compile C++ source files: xlC, xlC_r
  • Compile Fortran source files:
    • FORTRAN 77 files: xlf, xlf_r
    • Fortran 90 files: xlf90, xlf90_r
    • Fortran 95 files: xlf95, xlf95_r
    • Fortran 2003 files: xlf2003, xlf2003_r
    • Fortran 2008 files: xlf2008, xlf2008_r

The commands with the _r suffix are thread-safe versions of the compiler and should be used to create threaded applications.

To compile a serial code, users can rely on the commands described above. For example, a serial application (C or Fortran 90) would be compiled as follows:

xlc my_program.c -o serial_prog
xlf90 my_program.f90 -o serial_prog

To compile a parallel application, users would instead need to use the mpcc, mpCC, and mpfort compiler wrappers. An MPI program (C or Fortran 90) for example, would be compiled with:

mpcc my_mpi_program.c -o par_prog
mpfort my_mpi_program.f90 -o par_prog

Compiling with IBM LLVM

The C/C++ front-end of LLVM is provided via Clang. The following commands should be used to invoke the IBM LLVM compilers:

  • Compile C source files: clang
  • Compile C++ source files: clang++

Compiling with PGI

An alpha release of the PGI compiler suite is available on Crest through the pgi modules. The PGI compiler provides C/C++ and Fortran interfaces. To invoke the PGI compilers you can use the following commands:

  • Compile C source files: pgcc
  • Compile C++ source files: pgc++
  • Compile Fortran source files: pgfortran

Compiling with GCC

The GNU Compiler Collection (GCC) is available on Crest and it is installed at the system level. Users can directly access it without the need to load a module. The default version of GCC currently installed is GCC 4.9.3. GCC 5.3.1 is also installed and can be accessed via the gcc-5 command.

The following commands can be used to invoke the GCC compilers:

  • Compile C source files: gcc, gcc-5
  • Compile C++ source files: g++, g++-5
  • Compile Fortran source files: gfortran, gfortran-5

In addition, a pre-release version of GCC 6.0.0 is now available on Crest, and can be accessed by loading the gcc/6.0.0-20160128 module.

Compiling with PathScale ENZO

The PathScale ENZO compiler suite is available on Crest via the pathscale module. To invoke the PathScale compiler, you can use:

  • Compile C source files: pathcc
  • Compile C++ source files: pathCC
  • Compile Fortran 90 source files: pathf90
  • Compile Fortran 95 source files: pathf95

For more details on how to use the PathScale ENZO compiler, please see the PathScale ENZO User Guide.

Compiling OpenMP applications on Crest

All compilers on Crest provide support for OpenMP. The compiler flags needed to build OpenMP applications are listed in the table below.

 Compiler Compiler Flags Additional Flags/Notes
IBM XL -qsmp=omp Offload OpenMP 4 directives to the GPU: -qoffload
IBM LLVM -fopenmp=libomp
-omptargets=nvptx64sm_35-nvidia-linux
Include Path: –I<full-path-prefix>/clang-coral/omprtl/
PGI -mp
GCC -fopenmp The -fopenmp flag is also required for the linker.
PathScale -mp -device=kepler

Compiling OpenACC codes on Crest

On Crest, OpenACC support is available when using the PGI, GCC, or PathScale compilers. The table below includes the flags needed to compile OpenACC applications as well as additional flags available for specific compilers.

Compiler Compiler Flags Additional Flags
PGI -acc With CUDA managed memory:
-acc -ta=tesla:managedOn multicore with PGI 16.4+:
-acc -ta=multicore
GCC -fopenacc
PathScale -acc -device=kepler

Optimizing applications on Crest

The different compilers on Crest provide a diverse range of features. The following tables include information on the different optimizations available for each compiler.

Optimizations for IBM XL

Optimization Levels
Disable all optimizations (default) -qstrict
Local optimizations
Global optimizations -O2
Additional aggressive optimizations -O3
Maximize performance (default with -O3) -qmaxmem=-1
Additional Optimizations
Record Optimizations -qsaveopt
High Order Transformations -qhot
Floating point accuracy
Enable generation of code that follows IEEE arithmetic -qstrict=ieeefp

Optimizations for IBM LLVM

Architecture
Generate instructions specific to POWER -target powerpc64le-ibm-linux-gnu -mc-target powerpc64le-ibm-linux-gnu -mcpu=pwr8pu=pwr8
Optimization Levels
Disable all optimizations (default) -O0
Local optimizations -O1
Global optimizations -O2
Additional aggressive optimizations -O3
Additional Optimizations
Vectorization Enabled by default (-fno-vectorize to disable)
Vector width -force-vector-width=<n>
Unrolling -force-vector-unroll=<n>

Optimizations for PGI

Optimization Levels
Disable all optimizations -O0
Local optimization -O1
Global optimization (enables vectorization) -O2
Aggressive global optimization -O3
Hoist guarded invariant floating point expressions -O4
Maximize performance -fast
Additional Optimizations
Huge pages -Msmartalloc=huge
Autoparallelize loops -Mconcur
Enable vectorization -Mvect
Interprocedural Optimization -Mipa=fast,inline
CUDA Fortran -Mcuda
Prefetch instructions -Mvect=prefetch
Profile guided optimization -Mpfi -Mpfo
Unroll loops -Munroll
Floating point accuracy
Generate relaxed precision code -Mfprelaxed
Perform floating point operations in conformance with IEEE standard -Kieee

Optimizations for GCC

Optimization Levels
Disable all optimizations (default) -O0
Local optimizations -O1
Global optimizations -O2
Additional aggressive optimizations -O3
Maximize performance -Ofast
Additional Optimizations
Enable unrolling -funroll-all-loops
Generate prefetch instructions for loops -fprefetch-loop-arrays
--param prefetch-latency=300 (300-700)
Inline string operations -minline-all-stringops
Profile guided optimization -fprofile-generate -fprofile-use
Turn off partial redundancy elimination -fno-tree-pre
Vectorization -ftree-vectorize
Floating point accuracy
Enable generation of code that follows IEEE arithmetic -mieee-fp
Enable faster, less precise math operations -ffast-math

Optimizations for PathScale ENZO

Optimization Levels
Disable all optimizations -O0
Local optimizations -O1
Global optimizations (default) -O2
Additional aggressive optimizations -O3
Maximize performance -Ofast
Additional Optimizations
Autoparallelization -apo
Feedback directed optimization -fb-create -fb-opt
Interprocedural Analysis and Optimizations -ipa
Loop nest optimizations, vectorization, prefetch, fission, fusion -LNO:fission=<n> -LNO:fusion=<n>
AMP++ pathamp -device=kepler
Prefetch (disabled by default) -LNO:prefetch -LNO:prefetch_ahead
Floating point accuracy
Floating point accuracy -fp-accuracy

Additional Resources

The following resources provide more information on the IBM Power architecture as well as performance optimization and tuning techniques for IBM Power systems:

Job Execution on Crest

Crest uses the IBM LSF scheduler to manage and execute jobs. For more information about the LSF scheduler, please review the Crest user information page.

Once resources have been allocated for a job, users can run serial applications directly on the command line, or launch a parallel application via the poe command.

Running in an interactive job

When an interactive job starts, users are given a shell on the first compute node of their corresponding allocation. For this reason, serial jobs can be started simply by launching the executable on the command line:

crest-login1 $ bsub -Is $SHELL
Job <11138> is submitted to default queue <interactive>.
<<Waiting for dispatch ...>>
<<Starting on crest1.ccs.ornl.gov>>crest1 $ ./serial_prog
For parallel jobs, users must request the number of tasks during the job submission step. The poe command will use this information to launch the job in parallel across all allocated tasks. In the example below, 24 tasks are requested for 30 minutes inside an interactive job:
crest-login1 $ bsub -W 30 -n 24 -Is $SHELL Job <11141> is submitted to default queue <interactive>. <<Waiting for dispatch ...>> <<Starting on crest2.ccs.ornl.gov>>crest1 $ poe ./par_prog ATTENTION: 0031-393 Ignoring -resd/MP_RESD specified for batch job ATTENTION: 0031-408 24 tasks allocated by Resource Manager, continuing... Rank: 20 NID: crest4 Total: 24 Rank: 21 NID: crest4 Total: 24 Rank: 22 NID: crest4 Total: 24 Rank: 2 NID: crest1 Total: 24 Rank: 23 NID: crest4 Total: 24 Rank: 3 NID: crest1 Total: 24 Rank: 8 NID: crest1 Total: 24 Rank: 9 NID: crest1 Total: 24 Rank: 15 NID: crest1 Total: 24 Rank: 16 NID: crest1 Total: 24 Rank: 17 NID: crest1 Total: 24 Rank: 0 NID: crest1 Total: 24 Rank: 1 NID: crest1 Total: 24 Rank: 4 NID: crest1 Total: 24 Rank: 5 NID: crest1 Total: 24 Rank: 6 NID: crest1 Total: 24 Rank: 7 NID: crest1 Total: 24 Rank: 10 NID: crest1 Total: 24 Rank: 11 NID: crest1 Total: 24 Rank: 12 NID: crest1 Total: 24 Rank: 13 NID: crest1 Total: 24 Rank: 14 NID: crest1 Total: 24 Rank: 18 NID: crest1 Total: 24 Rank: 19 NID: crest1 Total: 24