titan

Up since 11/8/17 02:45 pm

eos

Up since 11/14/17 11:20 pm

rhea

Up since 10/17/17 05:40 pm

hpss

Up since 11/20/17 09:15 am

atlas1

Up since 11/15/17 07:25 am

atlas2

Up since 11/27/17 10:45 am
OLCF User Assistance Center

Can't find the information you need below? Need advice from a real person? We're here to help.

OLCF support consultants are available to respond to your emails and phone calls from 9:00 a.m. to 5:00 p.m. EST, Monday through Friday, exclusive of holidays. Emails received outside of regular support hours will be addressed the next business day.

GPU Accelerated Libraries

See this article in context within the following user guides: Titan

Due to the performance benefits that come with GPU computing, many scientific libraries are now offering accelerated versions. If your program contains BLAS or LAPACK function calls, GPU-accelerated versions may be available. Magma, CULA, cuBLAS, cuSPARSE, and LibSciACC libraries provide optimized GPU linear algebra routines that require only minor changes to existing code. These libraries require little understanding of the underlying GPU hardware, and performance enhancements are transparent to the end developer.

For more general libraries, such as Trilinos and PETSc, you will want to visit the appropriate software development site to examine the current status of GPU integration. For Trilinos, please see the latest documentation at the Sandia Trillinos page. Similarly, Argonne’s PETSc Documentation has a page containing the latest GPU integration information.

MAGMA

The MAGMA project aims to develop a dense linear algebra library similar to LAPACK but for heterogeneous/hybrid architectures. For C and Fortran code currently using LAPACK this should be a relatively simple port and does not require CUDA knowledge.

Use

This module is currently only compatible with the GNU programming environment:

$ module switch PrgEnv-pgi PrgEnv-gnu
$ module load cudatoolkit magma

To link in the MAGMA library while on Titan:

$ cc -lcuda -lmagma source.c

Linking in MAGMA on Rhea is a bit different because the Titan compiler wrapper takes care of some of the extra flags. For example:

$ nvcc $MAGMA_INC $MAGMA_LIB -L$ACML_DIR/gfortran64/lib -lmagma -lcublas -lacml source.c

This also requires that a BLAS library be loaded like ACML

$ module load acml
Resources

For comprehensive user manual please see the MAGMA Documentation. A knowledgable MAGMA User Forum is also available for personalized help. To see MAGMA in action see the following two PGI articles that include full example code of MAGMA usage with PGI Fortran: Using MAGMA With PGI Fortran and Using GPU-enabled Math Libraries with PGI Fortran.

CULA

CULA is a GPU accelerated linear algebra library, mimicking LAPACK, that utilizes the NVIDIA CUDA. For C and Fortran code currently using LAPACK this should be a relatively simple port and does not require CUDA knowledge.

Use

CULA is accessed through the cula module, for linking it will be convenient to load the cuda module as well:

$ module load cula-dense cudatoolkit

To link in the CULA library:

$ cc -lcula_core -lcula_lapack source.c
Resources

A comprehensive CULA Programmers Guide is available for CULA that covers everything you need to know to use it. Once the module is loaded you can find up to date documentation in the $CULA_ROOT/doc directory and examples in $CULA_ROOT/examples. An example of using CULA with PGI Fortran is available in Using GPU-enabled Math Libraries with PGI Fortran

Running the examples:

Obtain an interactive job and load the appropriate modules:

$ qsub -I -A[projID] -lwalltime=00:30:00,nodes=1
$ module load cuda cula-dense

Copy the example files:

$ cd $MEMBERWORK/[projid]
$ cp -r $CULA_ROOT/examples .
$ cd examples

Now each example can be built and executed:

$ cd basicUsage
$ make build64
$ aprun basicUsage
cuBLAS/cuSPARSE

cuBLAS and cuSPARSE are NVIDIA provided BLAS GPU routines optimized for dense and sparse use respectively. If your program currently uses BLAS routines integration should be straight forward and minimal CUDA knowledge is needed. Although primarily designed for use in C/C++ code Fortran bindings are available.

Use

cuBLAS and cuSPARSE are accessed through the cublas header and need to be linked against the cublas library:

$ module load cudatoolkit
$ cc -lcublas source.c
Resources

The CUBLAS and CUSPARSE user guides are available to download from NVIDIA, these guides provide complete function listings as well as example code. The nvidia SDK provides sample code and can accessed using the instructions below. An example of using CUBLAS with PGI Fortran is available in Using GPU-enabled Math Libraries with PGI Fortran.

Running the examples:

Obtain an interactive job and load the appropriate modules:

$ qsub -I -A[projID] -lwalltime=00:30:00,nodes=1
$ module switch PrgEnv-pgi PrgEnv-gnu
$ module load cudatoolkit nvidia-sdk

Copy the example files:

$ cd $MEMBERWORK/[projid]
$ cp -r $NVIDIA_SDK_PATH/CUDALibraries .
$ cd CUDALibraries

Now each example can be executed:

$ cd bin/linux/release
$ aprun simpleCUBLAS
LibSciACC

Cray’s LibSciACC provides GPU enabled BLAS and LAPACK routines. LibSicACC provides two interfaces, automatic and manual. The Automatic interface is largely transparent to the programmer. LibSciACC will determine if the call is likely to benefit from GPU acceleration and if so will take care of accelerating the routine and the associated memory management. The manual interface provides an API to manage accelerator resources, providing more control to the programmer.

Use

It is recommended that the craype-accel-nvidia35 module be used to manage LibSciAcc. The LibSciACC Automatic interface is currently compatible with the Cray and GNU programming environments:

$ module switch PrgEnv-pgi PrgEnv-cray
$ module load craype-accel-nvidia35

LibSciAcc will automatically be linked in when using the Cray provided compiler wrappers

$ cc source.c
$ ftn source.f90
Resources

The man page intro_libsci_acc provides detailed usage information. The environment variable $LIBSCI_ACC_EXAMPLES_DIR specifies a directory containing several C and Fortran example codes.

cuFFT

CUFFT provides a set of optimized GPU fast Fourier routines that are provided by NVIDIA as part of the CUDA toolkit. The CUFFT library provides an API similar to FFTW for managing accelerated FFT’s. The CUFFTW interface provides a FFTW3 interface to CUFFT to aid in porting existing applications.

Use

The cudatoolkit module will append the include and library directories required by CUFFT. When using NVCC or the GNU programming environment the library can then be added.

$ module load cudatoolkit
$ cc -lcufft source.c
$ nvcc -lcufft source.c
Resources

NVIDIA provides comprehensive documentation, including example code, available Here. For an example of using the CUFFT with Fortran through the ISO_C_BINDING interface please see the following example. The OLCF provides an OpenACC and CUFFT interoperability tutorial.

cuRAND

CURAND is an NVIDIA provided random number generator library. CURAND provides both a host launched and device inalienable interface. Multiple pseudorandom and quasirandom algorithms are supported.

Use

The cudatoolkit module will append the include and library directories required by CURAND. When using NVCC or the GNU programming environment the library can then be added.

$ module load cudatoolkit
$ module switch PrgEnv-pgi PrgEnv-gnu
$ cc -lcurand source.c
$ nvcc -lcurand source.cu
Resources

NVIDIA provides comprehensive documentation, including example code, available Here. For an example of using the CURAND host library please see the following OLCF tutorial.

Thrust

Thrust is a CUDA accelerated C++ template library modeled after the Standard Template Library(STL). Thrust provides a high level host interface for GPU data management as well as an assortment of accelerated algorithms. Even if your application is not currently using the STL the easy access to many optimized accelerated algorithms Thrust provides is worth taking a look at.

Use

The cudatoolkit module will append the include and library directories required by Thrust. When using NVCC or the GNU programming environment the library can then be added.

$ module load cudatoolkit
$ module switch PrgEnv-pgi PrgEnv-gnu
$ CC source.cpp
$ nvcc source.cu
Resources

NVIDIA provides comprehensive documentation, including example code, available Here. For an example of using Thrust please see the following OLCF tutorial. The Github page allows access to the Thrust source code, examples, and information on how to obtain help.