forge Overview

Arm Forge brings together Arm DDT and Arm MAP to enable both debugging and profiling in a single package. See the Support section below for help on how to use each tool, or the Arm Forge User Guide for more information.
Support

Usage

Arm DDT

Arm DDT is an advanced debugging tool used for scalar, multi-threaded, and large-scale parallel applications. In addition to traditional debugging features (setting breakpoints, stepping through code, examining variables), DDT also supports attaching to already-running processes and memory debugging. In-depth debugging information is beyond the scope of this guide, and is best answered by the Arm Forge User Guide.

Additional DDT Articles

In addition to the information below, the following articles can help you perform specific tasks with DDT:

Launching DDT

The first step in using DDT is to launch the DDT GUI. This can either be launched on the remote machine using X11 forwarding, or by running the Arm Forge Remote Client on your local machine, and connecting it to the remote machine. The remote client provides a native GUI (for Linux / OS X / Windows) that should be more far more responsive than X11, but requires a little extra setup. It is also useful if you don't have a preconfigured X11 server. To get started using the remote client, follow the Forge Remote Client setup guide. To use X11 forwarding, in a terminal do the following:
$ ssh -X user@<host>.ccs.ornl.gov
$ module load forge
$ ddt &

Running your job

Once you have launched a DDT GUI, we can initiate a debugging session from a batch job script or the terminal using DDT's "Reverse Connect" functionality. This will connect the debug session launched from the batch script to an already running GUI. This is the most flexible method of launching, and allows re-use of existing environment setup, or logic contained in scripts. This method can also be easily modified to launch DDT from an interactive batch session.
  • Copy or modify an existing job script (or start an interactive session).
  • Include the following near the top of your jobs script:
    module load ddt
  • Finally, prefix the jsrun/mpirun with ddt --connect, e.g.:
    $ jsrun -n 1024 ./myprogram
    becomes:
    $ ddt --connect jsrun -n 1024 ./myprogram
After submitting this script to the batch system (and waiting for it to be scheduled), a prompt will appear in the DDT GUI asking if you would like to debug this job. DDT Reverse Connect Prompt Once accepted, you can configure some final options before launching your program. DDT Reverse Connect Run Dialog

Offline Debugging

In addition to debugging interactively, DDT also supports "Offline Debugging". This can be particularly useful if your job takes a long time to schedule (and you're not sure if you'll be available when it runs). DDT will execute you program under the debugger, and write a plain text or HTML report for you to inspect at your convenience. To run your program with DDT's Offline Debugging, edit your existing job script and modify your launch command such that:
$ jsrun -n 1024 ./myprogram
Would become:
$ ddt --offline=output.html jsrun -n 1024 ./myprogram

Replacing printf / debug statements with tracepoints

Adding a quick debug statement is often a tempting next step when trying to debug an issue, but repeated compile/run cycles can quickly become time consuming. Rather than adding logging statements, you can add tracepoints inside DDT. Tracepoints have the following advantages over debug statements:
  • No source code modification - this means there's no need to recompile, and no need to track down and remove logging statements after debugging.
  • Scalability - variables can be collected and summarized over thousands of processes without worrying about where/how to store the output, or how to sift through the data afterwards.
  • Variables are automatically compared across processes. Variables with differing values across processes are highlighted and sparklines are included to give a quick graphical representation of the distribution on values.
For more information on tracepoints (including how to use them with interactive debugging), please see the Forge user guide. (Section 6.14 Tracepoints refers to tracepoints in general, while syntax can be found in section 15 DDT: Offline Debugging).

Attaching to a running job

You can also use DDT to connect to an already-running job. To do so, you must be connected to the system on which the job is running.
  1. Configure your attach hosts file (found in Options → System → Attach hosts file):
    • On Summit, an example can be found in <install-dir>/summit.nodes.
    • On other systems, this can be left blank, and later edited to include the node running you job launcher (e.g. jsrun/mpirun).
  2. Start DDT on the remote system (e.g. using the remote client, or running module load forge and then ddt).
  3. Select the option to "Attach to an already running program".
  4. In that dialog box, make sure the appropriate MPI implementation is selected. If not, click the "Change MPI" button and select the proper one.
  5. If the node which launched jsrun/mpirun is not listed after the word "Hosts", click on "Choose Hosts".
    • Click "Add".
    • Type the host name in the resulting dialog box and click "OK".
    • Click "OK" to return.
  6. Once DDT has finished scanning, your job should appear in the "Automatically-detected jobs" tab, select it and click the "Attach" button.

Starting DDT with X11 from an interactive-batch job

Starting DDT from within an interactive job gives you the advantage of running repeated debug sessions with different configurations while only waiting in the queue once.
  1. Start your interactive-batch job - e.g. bsub -XF -Is $SHELL ... (-XF enables X11 forwarding).
  2. Run module load forge.
  3. Prefix your existing launch command withddt.
    • e.g. ddt jsrun -n6 -g1 ./myapp
  4. Complete remaining configuration from the run dialog, and click "Run".

Letting DDT submit your job to the queue

This method can be useful when using the Forge Remote Client and when your program doesn't have a complex existing launch script.
  1. Run module load forge.
  2. Run ddt.
  3. When the GUI starts, click the "Run and debug a program" button.
  4. In the DDT "Run" dialog, ensure the "Submit to Queue" box is checked.
  5. Optionally select a queue template file (by clicking "Configure" by the "Submit to Queue" box). If your typical job scripts are basically only a jsrun/mpirun command, the default is fine. If your scripts are more complex, you'll need to create your own template file. The default file can be a good start. If you need help creating one, contact help@olcf.ornl.gov or see the Forge User Guide.
  6. Click the "Parameters" button by the "Submit to Queue" box.
  7. In the resulting dialog box, select an appropriate walltime limit, account, and queue. Then click "OK".
  8. Enter your executable in the "Application" box, enter any command line options your executable takes on the "Arguments" line, and select an appropriate number of processes and threads.
  9. Click "Submit". Your job will be submitted to the queue and your debug session will start once the job begins to run. While it's waiting to start, DDT will display a dialog box displaying showq output.

Arm MAP

Arm MAP (part of the Arm Forge suite, with DDT) is a profiler for parallel, multithreaded or single threaded C, C++, Fortran and F90 codes. It provides in depth analysis and bottleneck pinpointing to the source line. Unlike most profilers, it's designed to be able to profile pthreads, OpenMP or MPI for parallel and threaded code. MAP aims to be simple to use - there's no need to instrument each source file, or configure.

Preparing your application

In order to get the best profiling results with MAP:
  • Compile with "-g". This allows MAP to  show the corresponding source code with your application. This can also help MAP to unwind stacks, particularly on non-x86 systems like Summit.
  • If you wish to use line level CUDA profiling, also ensure "-g" is passed as a flag to nvcc. (See "CUDA Profiling" below for more information)
  • In general, keep existing optimization flags.

Generating a MAP output file

Arm MAP can generate a profile using the GUI or the command line. The GUI option should look familiar to existing users of DDT, whereas the command line option may offer the smoothest transition when moving from an existing launch configuration. MAP profiles are small in size, and there's generally no configuration required other than your existing jsrun/mpirun command line. To generate a profile using MAP, take an existing queue submission script and modify to include the following:
module load forge
And then add a prefix your launch command such that:
jsrun -n 128 ./myapp a b c
would become:
map --profile jsrun -n 128 ./myapp a b c
Once your job has completed running, the program's working directory should contain a timestamped .map file such as myapp_1p_1t_2016-01-01_12-00.map.

CUDA Profiling

By default, MAP will display time where the host is waiting on the GPU (as purple in the activity timeline) and time spent executable particular kernels. Additionally, MAP can perform line-level CUDA profiling with the "--cuda-kernel-analysis" flag. This requires that CUDA code is compiled with "nvcc -lineinfo". It should also be noted that this can add significant overhead to your profiling runs. See the Arm Forge User Guide for more information.

Profiling a subset of your application

To profile only a subset of your application, you can either use the --start-after=TIME and its command line options (see map --help for more information), or use the API to have your code tell MAP when to start and stop sampling, as detailed here.

Viewing a MAP profile

Once you have collected a profile, you can then view the information using the map command, either by launching and choosing "Load Profile Data File", or by specifying the file on the command line e.g.
map ./myapp_1p_1t_2016-01-01_12-00.map
(The above will require a SSH connection with  X11 forwarding, or other remote graphics setup.) An alternative that provides a local, native GUI (for Linux, OS X, or Windows) is to install the Arm Forge Remote Client on your local machine. This client is able to load and view profiles locally (useful when working offline), or remotely (which avoids the need to copy the profile data and corresponding source code to your local machine). The remote client can be used for both Arm DDT and Arm MAP. For more information on how to install and configure the remote client, see the remote client setup page. For more information see the Arm Forge user guide (also available via the "Help" menu in the MAP GUI).

Additional Arm MAP resources

Builds

TITAN

  • 6.0.3-47102
  • 6.0.5-47435
  • 6.0.5-rc1
  • 6.0.6
  • 6.0.6-47644
  • 6.1
  • 6.1.1
  • 6.1-47845
  • 7.0
  • 7.0.1
  • 7.0.5
  • 7.1

EOS

  • 6.0.3-47102
  • 7.0.1
  • 7.0.3
  • 7.0.5

RHEA

  • 6.0.3
  • 7.0.1
  • 7.0.5
  • 7.1