Several GPU accelerated libraries are provided on OLCF systems. Usage for the most common accelerated libraries is outlined below.
The MAGMA project aims to develop a dense linear algebra library similar to LAPACK but for heterogeneous/hybrid architectures. For C and Fortran code currently using LAPACK this should be a relatively simple port and does not require CUDA knowledge.
This module is currently only compatible with the GNU programming environment:
$ module switch PrgEnv-pgi PrgEnv-gnu $ module load cudatoolkit magma
To link in the MAGMA library while on Titan:
$ cc -lcuda -lmagma source.c
Linking in MAGMA on Rhea is a bit different because the Titan compiler wrapper takes care of some of the extra flags. For example:
$ nvcc $MAGMA_INC $MAGMA_LIB -L$ACML_DIR/gfortran64/lib -lmagma -lcublas -lacml source.c
This also requires that a BLAS library be loaded like ACML
$ module load acml
For comprehensive user manual please see the MAGMA Documentation. A knowledgable MAGMA User Forum is also available for personalized help. To see MAGMA in action see the following two PGI articles that include full example code of MAGMA usage with PGI Fortran: Using MAGMA With PGI Fortran and Using GPU-enabled Math Libraries with PGI Fortran.
cuBLAS and cuSPARSE are NVIDIA provided BLAS GPU routines optimized for dense and sparse use respectively. If your program currently uses BLAS routines integration should be straight forward and minimal CUDA knowledge is needed. Although primarily designed for use in C/C++ code Fortran bindings are available.
cuBLAS and cuSPARSE are accessed through the cublas header and need to be linked against the cublas library:
$ module load cudatoolkit $ cc -lcublas source.c
The CUBLAS and CUSPARSE user guides are available to download from NVIDIA, these guides provide complete function listings as well as example code. The nvidia SDK provides sample code and can accessed using the instructions below. An example of using CUBLAS with PGI Fortran is available in Using GPU-enabled Math Libraries with PGI Fortran.
Running the examples:
Obtain an interactive job and load the appropriate modules:
$ qsub -I -A[projID] -lwalltime=00:30:00,nodes=1 $ module switch PrgEnv-pgi PrgEnv-gnu $ module load cudatoolkit nvidia-sdk
Copy the example files:
$ cd $MEMBERWORK/[projid] $ cp -r $NVIDIA_SDK_PATH/CUDALibraries . $ cd CUDALibraries
Now each example can be executed:
$ cd bin/linux/release $ aprun simpleCUBLAS
Cray’s LibSciACC provides GPU enabled BLAS and LAPACK routines. LibSicACC provides two interfaces, automatic and manual. The Automatic interface is largely transparent to the programmer. LibSciACC will determine if the call is likely to benefit from GPU acceleration and if so will take care of accelerating the routine and the associated memory management. The manual interface provides an API to manage accelerator resources, providing more control to the programmer.
It is recommended that the craype-accel-nvidia35 module be used to manage LibSciAcc. The LibSciACC Automatic interface is currently compatible with the Cray and GNU programming environments:
$ module switch PrgEnv-pgi PrgEnv-cray $ module load craype-accel-nvidia35
LibSciAcc will automatically be linked in when using the Cray provided compiler wrappers
$ cc source.c
$ ftn source.f90
The man page intro_libsci_acc provides detailed usage information. The environment variable $LIBSCI_ACC_EXAMPLES_DIR specifies a directory containing several C and Fortran example codes.
CUFFT provides a set of optimized GPU fast Fourier routines that are provided by NVIDIA as part of the CUDA toolkit. The CUFFT library provides an API similar to FFTW for managing accelerated FFT’s. The CUFFTW interface provides a FFTW3 interface to CUFFT to aid in porting existing applications.
The cudatoolkit module will append the include and library directories required by CUFFT. When using NVCC or the GNU programming environment the library can then be added.
$ module load cudatoolkit
$ cc -lcufft source.c
$ nvcc -lcufft source.c
NVIDIA provides comprehensive documentation, including example code, available Here. For an example of using the CUFFT with Fortran through the ISO_C_BINDING interface please see the following example. The OLCF provides an OpenACC and CUFFT interoperability tutorial.
CURAND is an NVIDIA provided random number generator library. CURAND provides both a host launched and device inalienable interface. Multiple pseudorandom and quasirandom algorithms are supported.
The cudatoolkit module will append the include and library directories required by CURAND. When using NVCC or the GNU programming environment the library can then be added.
$ module load cudatoolkit
$ module switch PrgEnv-pgi PrgEnv-gnu $ cc -lcurand source.c
$ nvcc -lcurand source.cu
NVIDIA provides comprehensive documentation, including example code, available Here. For an example of using the CURAND host library please see the following Accelerator Interoperability Tutorial.
Thrust is a CUDA accelerated C++ template library modeled after the Standard Template Library(STL). Thrust provides a high level host interface for GPU data management as well as an assortment of accelerated algorithms. Even if your application is not currently using the STL the easy access to many optimized accelerated algorithms Thrust provides is worth taking a look at.
The cudatoolkit module will append the include and library directories required by Thrust. When using NVCC or the GNU programming environment the library can then be added.
$ module load cudatoolkit
$ module switch PrgEnv-pgi PrgEnv-gnu $ CC source.cpp
$ nvcc source.cu
NVIDIA provides comprehensive documentation, including example code, available Here. For an example of using Thrust please see the following Accelerator Interoperability II Tutorial. The Github page allows access to the Thrust source code, examples, and information on how to obtain help.