Fixing Memory Leaks with Arm DDT Leak Reports

Memory leaks occur when memory is allocated, and not correctly freed. This can be particularly problematic if the allocations are large or frequent. Over time, these leaks can degrade performance, or worse, cause the program to fail.

DDT’s memory debugging features allow analysis of allocated heap memory, both interactively, using the GUI, and non-interactively, using DDT’s “offline debugging” mode.

The information below will show you how to generate a leak report to pinpoint leaks, and eliminate them. Unlike conventional, interactive debugging, these reports can be created during a batch job, meaning  you do not need to be present at the time your job is scheduled.

Source Code

The source code for this example can be downloaded here. The source code is contained within a git repository with tagged versions “initial“, “fix-1” and “fix-2“.

In addition, the “leak-reports” folder contains example leak reports for the different versions, and two queue submission files are included to launch the example program with and without DDT.

Linking with DDT’s Memory Debugging Library

The first step towards creating a memory leak report is to link your program with DDT’s memory debugging library. This will intercept calls to memory allocation and release functions (such as malloc and free) and record their location in your program.

Note: Manual linking is required only for Cray systems, or when using static linking.

Linking with DDT’s memory debugging library can be automated by loading the ddt-memdebug module. After loading your usual compilation environment, load the following modules:

$ module load forge
$ module load ddt-memdebug

Then re-link your program. How this is done will vary depending on your build system, but it’s often sufficient to delete the application binary, and have make regenerate it. For our example:

$ rm mandel
$ make

Launching with DDT

The next step is to launch the program with DDT. As of DDT 5.0, we can prefix our aprun command with the appropriate DDT command.

In our example, we can edit submit.qsub to first load the DDT module:

$ source $MODULESHOME/init/bash # Only required if used in a batch script
$ module load forge

and then modify our aprun command so that:

$ aprun -n 16 ./mandel

becomes:

$ ddt --offline=leak-report.html --mem-debug=fast --check-bounds=off aprun -n 4 ./mandel

The DDT arguments used are as follows:

--offline=leak-report.html: This tells DDT to run in non-interactive “offline” mode, and save the output to leak-report.html

--mem-debug=fast: This enables the memory debugging options in DDT and uses the “fast” preset. (The “fast” present runs the fewest memory checks, in order to reduce overhead).

--check-bounds=off: This disables bounds checking (or “guard pages”) in DDT. While this can be useful when tracking down invalid memory accesses, disabling this will reduce the runtime and memory overhead.

(The download bundle also contains a pre-modified version of submit.qsub named submit-ddt.qsub)

Now we submit our batch job:

$ qsub -A <projectID> ddt-submit.qsub

Interpretting the Output

Once the job has finished, copy the output file (leak-report.html) to your local machine and open it with a web browser. (Alternatively, open leak-reports/initial.html from the source code download.)

Scrolling down to the leak report section, we should see something like this:

Initial Leak Report Chart

Initial Leak Report Chart

For scalability reasons, DDT will limit the report to the 8 ranks with the greatest memory leakage (this can be controlled with the --leak-report-top-ranks command line argument).

In the example shown we can see that rank 0 has leaked more memory than the others, and that most of the allocations were created by the Packet::allocate function. Clicking the bar chart item for rank 0 will display a table below showing details of the allocations:

Initial Leak Report Allocations Table

Initial Leak Report Allocations Table

This table shows allocations grouped by the backtrace taken when the allocation was made, along with source code snippets. This information can be used to identify code paths leading to the largest leaks.

In our example, the first row of the table represents a single, 16MB allocation, whereas the second row represents 92 smaller allocations, totaling 14.72MB. All of these allocations share the same allocation site (as noted by “#0 Packet::allocate() (packet.cpp:91)” at the top of each stack), but have taken different paths through the code to get there (as shown by different entries further down the stack).

Once we have the allocation site (the Packet::allocate member function), the next step is figuring out why this allocation isn’t freed. From the source code snippet, we can see that the allocation is assigned to the iterations variable.

Reading through the Packet class, we can determine that the iterations allocation is owned by the Packet class, and yet, the Packet::~Packet destructor doesn’t contain code to free the allocation. The simple fix here is to add free(pointer); to the destructor. (See”git show fix-1” for more details).

After making the fix, running “make” will recompile the code.

Another Leak!

After fixing our leak and recompiling, the next step is to verify the leak is gone, by resubmitting our job and generating a new leak report.

Opening the newly-generated leak-report.html (or leak-reports/fix-1.html from the source code download) shows the following:

Leak Report Bar Chart (After Fix 1)

Leak Report Bar Chart (After Fix 1)

While we’ve fixed one of our leak sites, rank 0 is still leaking around 16MB of memory. Clicking the bar chart item will again show us the allocation details.

Leak Report Allocations Table (After Fix 1)

Leak Report Allocations Table (After Fix 1)

Here we see that the remaining leak was created from a single allocation (again, made in Packet::allocate).

As we’ve already fixed the leak in the Packet class, we should check that the Packet object itself is being correctly freed. We can do this by methodically working our way up the stack.

Following the backtrace, we see that Packet::allocate is called by Packet::stitch, which is in turn called by PacketFactory::stitch. The source code snippet for PacketFactory::stitch shows us that Packet::stitch is being called on the packet member object, so let’s verify that this is being freed.

With a little reading of the source code (e.g. packetfactory.h), we can see that packet is a plain object member of the PacketFactory class, so when a PacketFactory object is freed, packet should be freed too.

Let’s jump up another level to find the origin of the PacketFactory object itself. strategy1 (found in mandel.cpp) shows the PacketFactory is actually an instance of the derived class SimplePacketFactory, named factory.

Hopping up one final time to main, we can see that factory object is passed to stragegy1 as a dereferenced pointer (*factory), and dynamically allocated a little further up the function.

The rest of the main function is relatively simple, and we can see that factory isn’t directly freed (or passed to any additional function calls where it could be freed). Now we’ve found the source of the leak, there are a few options to fix it:

We could rewrite the code to avoid the dynamic allocation entirely, or wrap the pointer in a C++ auto_ptr(/C++11 unique_ptr), but the simplest solution here is to add “delete factory” once we have finished using the factory (i.e. after the switch statement). See “git show fix-2” for more details.

After making our final change, let’s recompile and generate one last report to verify our leak has been fixed.

Opening leak-report.html (or leak-reports/fix-2.html from the source code download), we see:

Leak Report Bar Chart (After Fix 2)

Leak Report Bar Chart (After Fix 2)

The chart may initially look busier than our other reports, but the total leaked memory is now only 16.75 kB, and the functions responsible are from various system libraries outside of our control.

We’ve now successfully rid our program of the two memory leaks, and improved the correctness of our code. We can also be more confident that our program (at least for the current configuration) is leak-free.