titan

Up since 11/8/17 02:45 pm

eos

Up since 11/14/17 11:20 pm

rhea

Up since 10/17/17 05:40 pm

hpss

Up since 11/20/17 09:15 am

atlas1

Up since 11/15/17 07:25 am

atlas2

Up since 11/27/17 10:45 am
OLCF User Assistance Center

Can't find the information you need below? Need advice from a real person? We're here to help.

OLCF support consultants are available to respond to your emails and phone calls from 9:00 a.m. to 5:00 p.m. EST, Monday through Friday, exclusive of holidays. Emails received outside of regular support hours will be addressed the next business day.

Score-P


Description

Description

Website: Score-P

Score-P is a performance evaluation tool for large scale parallel applications. It provides a measurement infrastructure for profiling, event trace recording, and online analysis of High Performance Computing applications. Score-P allows users to instrument and record the behavior of sequential, multi-process (MPI, SHMEM), thread-parallel (OpenMP, Pthreads) and accelerator-based (CUDA, OpenCL) applications as well as hybrid parallel applications. Profile data, in CUBE4 format, can be viewed with CUBE or cube_stat. The Score-P trace files, in OTF2 format, can be visualized using Vampir.

Usage

Score-P Workflow

Here is an overview of the workflow for Score-P. In the sections below each of these steps will be explained in greater detail.

  1. Instrument your application with Score-P
  2. Perform a measurement run with profiling enabled
  3. Perform profile analysis with CUBE or cube_stat
  4. Use scorep-score to define an appropriate filter
  5. Perform a measurement run with tracing enabled and the filter applied
  6. Perform in-depth analysis on the trace data with Vampir

Instrument your Application

    In order to instrument an application, you need to recompile the application using the Score-P instrumentation command, which is added as a prefix to the original compile and link command lines. For each of the compiler wrappers:
C scorep cc
C++ scorep CC
FORTRAN scorep ftn

Example:

$ module load scorep
$ scorep cc -o test test.c

The Score-P instrumentation command will use the compilers of your loaded programming environment by default. If you switch the PrgEnv, reload the scorep module.

$ module unload scorep
$ module switch PrgEnv-pgi PrgEnv-gnu
$ module load scorep

Usually the Score-P instrumenter scorep is able to automatically detect the programming paradigm from the set of compile and link options given to the compiler. In some cases however, e.g., for CUDA applications, scorep needs to be made aware of the programming paradigm in order to do the correct instrumentation.
To see all available options for instrumentation:

$ scorep --help
 This is the Score-P instrumentation tool. The usage is:
 scorep <options> <original command>
 
Common options are:
  ...
  --compiler    Enables compiler instrumentation.
                By default, it disables pdt instrumentation.
  --nocompiler  Disables compiler instrumentation.
  --user        Enables user instrumentation.
  --cuda        Enables cuda instrumentation.
  ...

Example for instrumenting the C++ part of a CUDA application with user instrumentation enabled:

$ scorep --cuda --user CC

For CMake and autotools based build systems it is recommended to use the scorep-wrapper script instances, see below. The intended usage of the wrapper instances is to replace the application’s compiler and linker with the corresponding wrapper at configuration time so that they will be used at build time. As Score-P instrumentation during the cmake or configure steps is likely to fail, the wrapper script allows to disable instrumentation by setting the variable SCOREP_WRAPPER to OFF.

For CMake based build systems it is recommended to configure in the following way:

$ SCOREP_WRAPPER=OFF cmake .. \
     -DCMAKE_C_COMPILER=scorep-cc \
     -DCMAKE_CXX_COMPILER=scorep-CC \
     -DCMAKE_Fortran_COMPILER=scorep-ftn

For autotools based build systems it is recommended to configure as follows:

$ SCOREP_WRAPPER=OFF ../configure \
     CC=scorep-cc \
     CXX=scorep-CC \
     FC=scorep-ftn \
     --disable-dependency-tracking

Note: SCOREP_WRAPPER=OFF disables the instrumentation only in the environment of the configure or cmake command. Subsequent calls to “make” are not affected and will instrument the application as expected.

To pass options to the scorep command in order to diverge from the default instrumentation or to activate CUDA instrumentation, use the variable SCOREP_WRAPPER_INSTRUMENTER_FLAGS at make time:

$ make SCOREP_WRAPPER_INSTRUMENTER_FLAGS="--cuda"

The wrapper also allows to pass flags to the wrapped compiler call by using the variable SCOREP_WRAPPER_COMPILER_FLAGS.
Example for a “make” command using both Score-P instrumentation and additional compiler flags:

$ make SCOREP_WRAPPER_INSTRUMENTER_FLAGS="--cuda" SCOREP_WRAPPER_COMPILER_FLAGS="-g -O2"

This will result in the execution of:

scorep --cuda <your-compiler> -g -O2

Note: If “make install” does a re-linking and you are not using the default instrumentation, you need to pass Score-P instrumentation and compiler flags again, see “make”.

Perform a measurement run with profiling enabled

Measurements are configured via environment variables:

$ scorep-info config-vars --full
SCOREP_ENABLE_PROFILING
[...]
SCOREP_ENABLE_TRACING
[...]
SCOREP_TOTAL_MEMORY
Description: Total memory in bytes for the measurement system
[...]
SCOREP_EXPERIMENT_DIRECTORY
Description: Name of the experiment directory
[...]

On Titan, by default, profiling is enabled and tracing is disabled.

Here is an example for generating a profile using the environment variables and aprun:

$ export SCOREP_ENABLE_PROFILING=true 
$ export SCOREP_ENABLE_TRACING=false
$ export SCOREP_EXPERIMENT_DIRECTORY=profile
$ aprun instrumented_binary.out

Perform analysis on profile data

Profile performance analysis can be done with CUBE or cube_stat.

CUBE is a profile analysis tool for displaying performance data of parallel programs. It can be run on the login nodes if you are using X forwarding. You may need a remote client, such as nomachine, if the GUI responds too slowly. Score-P generates file of the form profile.cubex for this application.

Call-path profile analysis with CUBE:

$ cube profile/profile.cubex

Please see CUBE User Guide for a more detailed description.

For a quick topN text-based flat profile analysis you can also use the tool cube_stat.

Flat profile analysis with cube_stat for the top 3 most time consuming functions:

$ cube_stat -t 3 -p profile/profile.cubex 

Output:

cube::Region            NumberOfCalls       ExclusiveTime  InclusiveTime
binvcrhs               522844416.000000       200.939958    200.939958
!$omp do @z_solve.f:52   51456.000000         159.719801    321.887996
!$omp do @y_solve.f:52   51456.000000         147.645683    302.313644

Define an appropriate filter with scorep-score

scorep-score is a tool that allows to estimate the size of an OTF2 trace from a CUBE4 profile. Furthermore, the effects of filters are estimated. The main goal is to define appropriate filters for a tracing run from a profile.

To invoke scorep-score with a detailed output for every recorded function you must provide the -r option and the filename of a CUBE4 profile as argument:

$ scorep-score –r profile/profile.cubex

Output:

Estimated aggregate size of event trace:                   40GB
Estimated requirements for largest trace buffer (max_buf): 10GB
Estimated memory requirements (SCOREP_TOTAL_MEMORY):       10GB
(warning: The memory requirements can not be satisfied by Score-P to avoid
 intermediate flushes when tracing. Set SCOREP_TOTAL_MEMORY=4G to get the
 maximum supported memory or reduce requirements using USR regions filters.)

Flt type     max_buf[B]        visits time[s] time[%] time/visit[us]  region
     ALL 10,690,196,070 1,634,070,493 1081.30   100.0           0.66  ALL
     USR 10,666,890,182 1,631,138,069  470.23    43.5           0.29  USR
     OMP     22,025,152     2,743,808  606.80    56.1         221.15  OMP
     COM      1,178,450       181,300    2.36     0.2          13.04  COM
     MPI        102,286         7,316    1.90     0.2         260.07  MPI

     USR  3,421,305,420   522,844,416  144.46    13.4           0.28  matmul_sub
     USR  3,421,305,420   522,844,416  102.40     9.5           0.20  matvec_sub
     ...

The first line of the output gives an estimation of the total size of the trace, aggregated over all processes. This information is useful for estimating the space required on disk. In the given example, the estimated total size of the event trace is 40GB.
The second line prints an estimation of the memory space required by a single process for the trace. The memory space that Score-P reserves on each process at application start must be large enough to hold the process’ trace in memory in order to avoid flushes during runtime, because flushes heavily disturb measurements. In addition to the trace, Score-P requires some additional memory to maintain internal data structures. Thus, it provides also an estimation for the total amount of required memory on each process. The memory size per process that Score-P reserves is set via the environment variable SCOREP_TOTAL_MEMORY. In the given example the per process memory is about 10GB. A Titan node provides 32GB of RAM. If using 16 processes on a node, you have to reduce the per process memory to 2GB by defining a filter.
For defining a filter, it is recommended to exclude short frequently called functions from measurement, because they require a lot of buffer space (represented by a high value under max_tbc) but incur a high measurement overhead. MPI functions and OpenMP constructs cannot be filtered. Thus, it is usually a good approach to exclude regions of type USR starting at the top of the list until you reduced the trace to your needs.

The example below excludes the functions matmul_sub and matvec_sub from the trace:

$ cat scorep.filter
SCOREP_REGION_NAMES_BEGIN
 EXCLUDE
   matmul_sub
   matvec_sub
SCOREP_REGION_NAMES_END

You can use scorep-score to test the effect of your filter on the trace file. Therefor, you need to pass a -f followed by the file name of your filter:

$ scorep-score profile/profile.cubex -f scorep.filter

Perform a measurement run with tracing enabled and the filter applied

Below the filter given above has been applied by exporting the SCOREP_FILTERING_FILE variable. Furthermore, SCOREP_TOTAL_MEMORY is set according to the estimated memory size per process by scorep-score. This process with generates files of the form traces.otf2, which can be analyzed with Vampir.

$ export SCOREP_ENABLE_PROFILING=false
$ export SCOREP_ENABLE_TRACING=true
$ export SCOREP_EXPERIMENT_DIRECTORY=trace
$ export SCOREP_TOTAL_MEMORY=2GB
$ export SCOREP_FILTERING_FILE=scorep.filter

$ aprun instrumented_binary.out

Perform analysis on the trace data with Vampir

Vampir allows a visual GUI interface to analyse the trace.otf2 files generated with Score-P.

To initate the trace for very small trace files in the trace directory, login with X forwarding on and issue the following command:

$ vampir trace/traces.otf2 

However this is not recommended for most trace files. Instead, for better performance, please use the Server/Client version: see the Vampir documentation for further information.

For a more detailed description see Score-P.

Supported features based on scorep/3.0

Titan PGI GNU INTEL CRAY
MPI instrumentation x x x x
OpenMP instrumentation x x x x
Pthreads instrumentation x x x x
CUDA instrumentation x x x x
OpenCL instrumentation x x x
OpenACC instrumentation x
Cray-SHMEM instrumentation x x x x
TAU instrumentation x x x x
PAPI counter x x x x
Sampling x x  x  x
Memory Recording x x x x
Eos PGI GNU INTEL CRAY
MPI instrumentation x x x x
OpenMP instrumentation x x x x
Pthreads instrumentation x x x x
CUDA instrumentation
OpenCL instrumentation
OpenACC instrumentation
Cray-SHMEM instrumentation x x x x
TAU instrumentation x x x x
PAPI counter x x x x
Sampling x  x x x
Memory Recording x x x x
Rhea PGI GNU INTEL
MPI instrumentation x x x
OpenMP instrumentation x x x
Pthreads instrumentation x x x
CUDA instrumentation
OpenCL instrumentation
OpenACC instrumentation
SHMEM instrumentation x x x
TAU instrumentation x x x
PAPI counter x x x
Sampling x  x x
Memory Recording x  x  x

Contact

Please contact Ronny Brendel (on-site contact person) with any questions.

Furthermore, you may contact Score-P Support or OLCF User Support.

Versions

Available Versions

System Application/Version
Titan scorep/3.0
Eos scorep/3.0
Rhea scorep/3.0