Arm MAP (part of the Arm Forge suite, with DDT) is a profiler for parallel, multithreaded or single threaded C, C++, Fortran and F90 codes. It provides in depth analysis and bottleneck pinpointing to the source line. Unlike most profilers, it’s designed to be able to profile pthreads, OpenMP or MPI for parallel and threaded code. MAP aims to be simple to use – there’s no need to instrument each source file, or configure.
Linking your program with the MAP Sampler (for Cray systems)
In order to collect information about your program, you must link your program with the MAP sampling libraries. When using shared libraries on, MAP can do this automatically at runtime.
On Cray systems, the map-static-link and map-dynamic-link modules can help with this.
- module load forge
- module load map-link-static # or map-link-dynamic
- Re-compile or re-link your program.
Do I need to recompile?
There’s no need to instrument your code with Arm MAP, so there’s no strict requirement to recompile. However, if your binary wasn’t compiled with the -g compiler flag, MAP won’t be able to show you source-line information, so recompiling would be beneficial.
Note: If using the Cray compiler, you may wish to use
-G2 instead of
-g. This will prevent the compiler from disabling most optimizations, which could affect runtime performance.
Generating a MAP output file
Arm MAP can generate a profile using the GUI or the command line. The GUI option should look familiar to existing users of DDT, whereas the command line option may offer the smoothest transition when moving from an existing launch configuration.
MAP profiles are small in size, and there’s generally no configuration required other than your existing aprun command line.
To generate a profile using MAP, take an existing queue submission script and modify to include the following:
source $MODULESHOME/init/bash # May already be included if using modules module load forge
And then add a prefix your aprun command so that:
aprun -n 128 -N 8 ./myapp a b c
map --profile aprun -n 128 -N 8 ./myapp a b c
Once your job has completed running, the program’s working directory should contain a timestamped .map file such as myapp_1p_1t_2016-01-01_12-00.map.
Profiling a subset of your application
To profile only a subset of your application, you can either use the
--start-after=TIME and its command line options (see
map --help for more information), or use the API to have your code tell MAP when to start and stop sampling, as detailed here.
Viewing a MAP profile
Once you have collected a profile, you can then view the information using the map command, either by launching and choosing “Load Profile Data File”, or by specifying the file on the command line e.g.
(The above will require a SSH connection with X11 forwarding, or other remote graphics setup.)
An alternative that provides a local, native GUI (for Linux, OS X, or Windows) is to install the Arm Forge Remote Client on your local machine. This client is able to load and view profiles locally (useful when working offline), or remotely (which avoids the need to copy the profile data and corresponding source code to your local machine).
The remote client can be used for both Arm DDT and Arm MAP. For more information on how to install and configure the remote client, see the remote client setup page.
For more information see the Arm Forge user guide (also available via the “Help” menu in the MAP GUI).
Additional Arm MAP resources
The Score-P measurement infrastructure is a highly scalable and easy-to-use tool suite for profiling, event tracing, and online analysis of HPC applications. Score-P supports analyzing C, C++ and Fortran applications that make use of multi-processing (MPI, SHMEM), thread parallelism (OpenMP, PThreads) and accelerators (CUDA, OpenCL, OpenACC) and combinations.
For detailed information about using Score-P on Titan and the builds available, please see the Score-P Software Page.
Vampir is a software performance visualizer focused on highly parallel applications. It presents a unified view on an application run including the use of programming paradigms like MPI, OpenMP, PThreads, CUDA, OpenCL and OpenACC. It also incorporates file I/O, hardware performance counters and other performance data sources. Various interactive displays offer detailed insight into the performance behavior of the analyzed application. Vampir’s scalable analysis server and visualization engine enable interactive navigation of large amounts of performance data. Score-P and TAU generate OTF2 trace files for Vampir to visualize.