Profiling CUDA Code with NVPROF
NVIDIA’s command-line profiler,
nvprof, provides profiling for CUDA codes. No extra compiling steps are required to use
nvprof. The profiler includes tracing capability as well as the ability to gather many performance metrics, including FLOPS. The profiler data output can be saved and imported into the NVIDIA Visual Profiler for additional graphical analysis.
cuda module must be loaded.
summit> module load cuda
... jsrun -n1 -a1 -g1 nvprof ./hello_world_gpu ...
nvprof doesn’t provide aggregated MPI data, the
%p output file modifiers can be used to create separate output files for each host and process.
... jsrun -n1 -a1 -g1 nvprof -o output.%h.%p ./hello_world_gpu ...
There are many various metrics and events that the profiler can capture. For example, to output the number of double-precision FLOPS, you may use the following:
... jsrun -n1 -a1 -g1 nvprof --metrics flops_dp -o output.%h.%p ./hello_world_gpu ...
To see a list of all available metrics and events, use the following:
summit> nvprof --query-metrics summit> nvprof --query-events
nvprof on the command-line is a quick way to gain insight into your CUDA application, a full visual profile is often even more useful. For information on how to view the output of
nvprof in the NVIDIA Visual Profiler, see the NVIDIA Documentation.
Score-P is a performance evaluation tool for large scale parallel applications. It provides a measurement infrastructure for profiling, event trace recording, and online analysis of High Performance Computing applications. Score-P allows users to instrument and record the behavior of sequential, multi-process (MPI, SHMEM), thread-parallel (OpenMP, Pthreads) and accelerator-based (CUDA, OpenCL) applications as well as hybrid parallel applications. Profile data, in CUBE4 format, can be viewed with CUBE or cube_stat. The Score-P trace files, in OTF2 format, can be visualized using Vampir.
For detailed information about versions using Score-P on Summit and the builds available, please see the Score-P Software Page.
Vampir is a software performance analysis tool focused on highly parallel applications. It presents a unified view on an application run including information on the various used programming paradigms like MPI, OpenMP, PThreads, CUDA, OpenCL and OpenACC. It also incorporates performance data from hardware performance counters and other sources. Its many interactive displays offer insight into the performance behavior and reveal bottlenecks of applications. Vampir’s highly scalable analysis server and visualization engine enable interactive navigation through large amounts of detailed performance data.
Use Score-P to generate performance recordings for Vampir.
For detailed information about using Vampir on Summit and the builds available, please see the Vampir Software Page.