Titan

Development Tools

tmpImage

GPU Enabled Programming

Titan will be configured with a broad set of tools to facilitate GPU acceleration of applications both existing and new. These tools can be broken down into three categories based on their implementation methodology: GPU-accelerated libraries, accelerator compiler directives, and low-level GPU languages. Each of these three methods has pros and cons that must be weighed for each program and are not mutually exclusive. In addition to these tools, Titan will support a wide variety of performance and debugging tools to ensure that however you choose to implement GPU acceleration into your program, it is being done so efficiently.


Accelerator Compiler Directives

Accelerator compiler directives allow the compiler, guided by the programmer, to take care of low-level accelerator work. One of the main benefits of a directives-based approach is an easier and faster transition of existing code compared to low-level GPU languages. Additional benefits include performance enhancements that are transparent to the end developer and greater portability between current and future many-core architectures.

Tool Name Description
PGI Accelerator C and Fortran accelerator directive set for PGI compilers
CAPS HMPP Workbench Compiler and runtime environment that adds HMPP accelerator directive support to GNU, PGI, and Intel compilers
Cray GPU Compiler Coming Soon

GPU Accelerated Libraries

Due to the performance benefits that come with GPU computing, many scientific libraries are now offering accelerated versions. If your program contains BLAS or LAPACK function calls, GPU-accelerated versions may be available. Magma, CULA, cuBLAS, and cuSPARSE libraries provide optimized GPU linear algebra routines that require only minor changes to existing code. These libraries require little understanding of the underlying GPU hardware, and performance enhancements are transparent to the end developer.

Library Name Description
Magma Dense linear algebra library similar to LAPACK for heterogeneous architectures
CULA GPU-accelerated linear algebra library with a LAPACK-like interface.
cuBLAS/cuSPARSE CUDA-based implementations of BLAS provided by NVIDIA with dense and sparse optimized versions.

For more general libraries, such as Trilinos and PETSc, you will want to visit the software development site to checkout the current status of GPU integration. For Trilinos please see the latest documentation. PETSc has a web page containing the latest GPU integration information.

Low-Level GPU Languages

For complete control over the GPU Titan will support C for CUDA, PGI’s CUDA Fortran, and OpenCL. These languages and language extensions, while allowing explicit control, are generally more cumbersome than directive-based approaches and must be maintained to stay up to date with the latest performance guidelines. Substantial code structure changes may be needed and an in-depth knowledge of the underlying hardware is often necessary for best performance.

Tool Name Description
C for CUDA C99-based language with NVIDIA extensions, the basis for low-level NVIDIA GPU programming
PGI CUDA Fortran Low-level Fortran interface for CUDA API
OpenCL C99-based standard created for cross-platform heterogenous many-core architectures

GPU Performance Tools

To ensure efficient use of Titan a full suite of performance and analysis tools will be offered. These tools offer a wide variety of support from static code analysis to full runtime heterogeneous tracing.

Tool Name Description
NVIDIA Compute Profiler Command line interface for runtime GPU profiling and analysis
PGI Graphical Performance Profiler GUI for runtime CPU and GPU profiling
CAPS HMPP Wizard GUI application for providing specific tuning advice for common compute kernels
Vampir/Vampir Trace Provides full runtime CPU and GPU tracing, includes GUI for analyzing data
TAU Full suite of tools for profiling, tracing, and visualizing program execution