titan

Up since 11/8/17 02:45 pm

eos

Up since 11/14/17 11:20 pm

rhea

Up since 10/17/17 05:40 pm

hpss

Up since 11/20/17 09:15 am

atlas1

Up since 11/15/17 07:25 am

atlas2

Up since 11/27/17 10:45 am
OLCF User Assistance Center

Can't find the information you need below? Need advice from a real person? We're here to help.

OLCF support consultants are available to respond to your emails and phone calls from 9:00 a.m. to 5:00 p.m. EST, Monday through Friday, exclusive of holidays. Emails received outside of regular support hours will be addressed the next business day.

Accelerator Performance Tools

See this article in context within the following user guides: Titan

To ensure efficient use of Titan a full suite of performance and analysis tools will be offered. These tools offer a wide variety of support from static code analysis to full runtime heterogeneous tracing.

NVPROF

NVIDIA’s command line profiler, NVPROF, provides profiling for CUDA codes. No special steps are needed when compiling your code. The profiler includes tracing capability as well as the ability to provide many performance metrics, including FLOPS. The profiler data can be saved and imported into the NVIDIA visual profiler for easier analysis.

Running

To use NVPROF the cudatoolkit module must be loaded and PMI daemon forking disabled. To view the output in the NVIDIA Compute Visual Profiler X11 forwarding must be enabled.

The aprun -b flag is currently required to use NVPROF, this requires that your executable reside on a compute node visible filesystem.
$ module load cudatoolkit
$ export PMI_NO_FORK=1

Although NVPROF doesn’t provide MPI aggregated data the %h and %p output file modifiers can be used to create separate output files for each host and process.

$ aprun -b -n16 nvprof -o output.%h.%p ./gpu.out 

A variety of metrics and events can be captured by the profiler. For example to output the number of double precision flops you may use the following:

$ aprun -b -n16 nvprof --metrics flops_dp -o output.%h.%p ./gpu.out 

To see a list of all available metrics and events the following can be used:

$ aprun -b nvprof --query-metrics
$ aprun -b nvprof --query-events 

To view the output in the NVIDIA visual profiler please see the following NVIDIA documentation.

Resources

The nvprof user guide is available on the NVIDIA Developer Documentation Site and provides comprehensive coverage of the profiler’s usage and features.

NVIDIA Command Line Profiler

NVIDIA’s Command Line Profiler provides run-time profiling of CUDA and OpenCL code. No special steps are needed when compiling your code and any tool that utilizes CUDA or OpenCL code, including compiler directives and accelerator libraries, can be profiled. The profile data can be collected in .txt format or .csv format to be viewed with the NVIDIA visual profiler.

Running

To use the NVIDIA Command Line Profiler the cudatoolkit module must be loaded. To view the output in the NVIDIA Compute Visual Profiler X11 forwarding must be enabled:

$ module load cudatoolkit

Although NVPROF doesn’t provide MPI aggregated data the compute node HOSTNAME and %p output file modifiers can be used to create separate output files for each host and process. The %h modifier is not available in the command line profiler.

$ export COMPUTE_PROFILE=1
$ export COMPUTE_PROFILE_LOG=$MEMBERWORK/[projid]/output.$HOSTNAME.%p
$ aprun gpu.out

To view the output in the NVIDIA Visual Profiler please see the NVIDIA Visual Profiler Users Guide.

Resources

The NVIDIA Command Line Profiler User Guide is available on NVIDIA’s Developer Documentation Site and provides comprehensive coverage of the profilers usage and features.

Score-P

Score-P is a software system that provides a measurement infrastructure for profiling and event trace recording of MPI, SHMEM, OpenMP/Pthreads, and hybrid parallel applications as well as CUDA-enabled applications. Trace files, in OTF2 format, can be visualized using Vampir.

Running

To run, the scorep module must be loaded, and the cuda module must be loaded as well for GPU traces:

$ module load cudatoolkit
$ module load scorep

Once loaded, the program must be recompiled by prefixing scorep to the original compiler:

$ scorep cc source.c
$ scorep --cuda nvcc source.cu

To see all available options for instrumentation:

$ scorep --help

Example for generating trace files:

$ export SCOREP_ENABLE_PROFILING=false 
$ export SCOREP_ENABLE_TRACING=true
$ aprun instrumented_binary.out
Resources

Additional information can be found on the OLCF Score-P software page. The Score-P User Manual provides comprehensive coverage of Score-P usage.

Vampir

Vampir is a graphical user interface for analyzing OTF trace data. For small traces all analysis may be done on the users local machine running a local Vampir copy. For larger traces the GUI can be run from the users local machine while the analysis is done using VampirServer, running on the parallel machine.

Use

The easiest way to get started is to launch the Vampir GUI from an OLCF compute resource, however a slow network connection may limit usability.

$ module load vampir
$ vampir
Resources

Additional information may be found on the OLCF Vampir software page. The Vampir User Manual provides comprehensive coverage of Vampir usage.

TAU

TAU provides profiling and tracing tools for C, C++, Fortran, and GPU hybrid programs. Generated traces can be viewed in the included Paraprof GUI or displayed in Vampir.

Use

A simple GPU profiling example could be preformed as follows:

$ module switch PrgEnv-pgi PrgEnv-gnu
$ module load tau cudatoolkit
$ nvcc source.cu -o gpu.out

Once the cuda code has been compiled tau_exec -cuda can be used to profile the code at runtime

$ aprun tau_exec -cuda ./gpu.out

The resulting trace file can then be viewed using paraprof

$ paraprof
Resources

Additional information may be found in the OLCF TAU software page. The TAU documentation website contains a complete User Guide, Reference Guide, and even video tutorials.

CrayPAT

CrayPAT is a profiling tool that provides information on application performance. CrayPAT is used for basic profiling of serial, multiprocessor, multithreaded, and GPU accelerated programs.

Use

A simple GPU profiling example could be preformed as follows:

With PrgEnv-Cray:

$ module load craype-accel-nvidia35
$ module load perftools

With PrgEnv other than Cray:

$ module load cudatoolkit
$ module load perftools

Compiling:

$ nvcc -g -c cuda.cu
$ cc cuda.cu launcher.c -o gpu.out
$ pat_build -u gpu.out
$ export PAT_RT_ACC_STATS=all
$ pat_report gpu.out+*.xf
Resources

More information can be found on the CrayPAT software page. For more details on linking nvcc and compiler wrapper compiled code please see our tutorial on Compiling Mixed GPU and CPU Code.