Science - Written by on May 21, 2015

Using the Forest to See the Trees

Tags: , , , ,

The software structure (subroutine) of CLM. Each circle represents an individual subroutine with the area of circle showing the time spent on the subroutine with linear representation. Image credit: Wang, D. et al. “Environmental Modelling & Software” 55 (2014)

The software structure (subroutine) of CLM. Each circle represents an individual subroutine with the area of circle showing the time spent on the subroutine with linear representation.
Image credit: Wang, D. et al. “Environmental Modelling & Software” 55 (2014)

ORNL researcher uses global climate model to explore regional events

Climate modelers work to untangle complex webs of cause and effect

Every few years, unusual weather brings torrential rainfall and warm, nutrient-poor water to the coasts of Peru and Ecuador, devastating the fishing economies. Although this might seem like a local storm, the system—known as the El Niño–Southern Oscillation—has global effects. Typically, the next winter is much warmer in western North America, wetter in the southeastern United States, and drier in the Pacific Northwest. Rain and temperature changes throughout Africa and Australia and tropical cyclones off the coast of Japan also can be tied to El Niño occurrences.

Large-scale modeling plays an important role in understanding the convoluted relationships between climate events and weather systems. What may look, at first glance, like a jumble of random information can come beautifully into focus when viewed through the proper model.

The US Department of Energy (DOE) is pushing researchers to integrate observation and modeling to create a better picture of Earth’s climate as a whole. One researcher committed to this goal is Dali Wang of DOE’s Oak Ridge National Laboratory (ORNL).

Wang is a research staff member with ORNL’s Environmental Sciences Division as well as the lab’s Climate Change Science Institute. With a background in both computer and environmental science, he understands the goals and expectations of each side.

“There is a gap between experimental science and modeling science,” Wang said. “They each have a different set of goals. When DOE would say, ‘I want you to collaborate with each other,’ people would say, ‘We would like to do that, but the reality is too complicated.’

Wang’s solution is to take a massive model, pull out the relevant portion, and rewrap it with a user-friendly interface. This environment, known as a scientific function-testing platform, would make it easier for scientists to create accurate computational experiments to mimic what they observe in the real world.

“We get a lot of buy-in from field scientists, from modelers. We have the support, but I’m facing this huge software system,” Wang said. “We are dealing with this kind of legacy code, and it’s complicated. At some point nobody has a clear understanding of the whole picture. That’s the reality.”

The model that Wang is focusing on is the Community Land Model (CLM), which simulates the interaction of atmosphere and land. CLM is actively being developed under the Accelerated Climate Model for Energy project to support DOE’s climate and environmental research.

A function-testing platform such as Wang’s is designed to simplify the software while maintaining accuracy, but streamlining the massive program is not easy. The CLM modeling system consists of more than 1,800 source files and more than 350,000 lines of source code.

Two existing classes of computational tools—compilers and debuggers—make developing the functional testing platform more feasible. Compilers and debuggers already are widely used by the computational science community, but Wang is putting them to use in novel ways

Compilers are like translators for computer programs. They take the instructions written by a human programmer and turn them into the ones and zeros that allow a computer to do its job. But Wang wants to go further.

“The first time I got into the CLM code, I was shocked,” Wang said. “It took me a couple months to figure out the relationship between subroutines and components. I kept thinking about how this would be a big problem for modelers and computer science. So, how can we find out a way to help us both understand the structure of this software and make it easier for the people that do science?”

Field researchers who come to Wang are usually interested in specific components of the huge, multilayered CLM model. If two subroutines are designed to represent a closely connected physical or chemical process, there should be strong connections between them. Compilers can recognize these patterns and pull out only scientific codes that are relevant.

“We use compilers to understand the structure and easily extract code, but the first thing we want to do when we pull a software component out of the system is make sure it behaves exactly like it should when it’s in the system,” Wang said. “That’s where DDT comes into play.”

Programmers have relied on debuggers for years to find glitches and bugs in their code. Allinea’s debugger tool named DDT is the program that Wang is using, not just to find errors in the code, but also to serve as an event generator.

“If I have a whole simulation system (such as CLM), I would like to create an event that I’m interested in—say, a drought during an African summer—and see how my system reacts to it,” Wang said. “DDT has the capability to help me trigger this kind of event and track the simulation system’s responses.”

The tool effectively takes snapshots of what’s going on while the code is running. When a researcher applies a change to one subroutine, DDT tracks the inflow and outflow of various parts of the system to see which other parts were affected. Each time that same change is made to that same component, the result should be the same.

Although the model itself has the capability to track basic inputs and outputs, using DDT in this way allows for much more complex analysis. Wang compares it to a doctor ordering an MRI instead of taking a patient’s temperature.

This is a crucial part of a successful scientific function-testing platform. By definition, function testing examines the performance of a system in action. It gives researchers and modelers a detailed look at what the software is doing as it’s doing it.

The Oak Ridge Leadership Computing Facility (OLCF), a DOE Office of Science User Facility, funded Allinea to make DDT scalable to a system the size of Jaguar and Titan.

“DDT’s ability to work on sequential cores is essential for my current work. Its ability to work on millions of cores is also essential and irreplaceable for my future work.”

Wang’s research, from revamping the user interface to improving performance analysis, ties into the idea of Model-Experimental Coupling (MODEX), an approach that has been heavily endorsed by DOE with the aim of improving predictive understanding. Implementing experiments can be incredibly time-consuming. MODEX is a much faster method that uses models to visualize what should happen in a real-world scenario before using those predictions to develop more efficient studies. A functional testing platform makes it easier to use the observational data from those experiments to improve the model, creating a positive feedback loop that benefits everyone.

Wang says he’s looking to the future and trying to make the best products for all parties involved.

“We have so many investments in traditional computing tools that we can build upon,” Wang said. “If we want to advance computational science, we need science-friendly tools that can tackle the software system complexity in a more understandable and friendly way.”

Related Publication:
Wang, D., Y. Xu, P. Thornton, A. King, C. Steed, L. Gu, and J. Schuchart. “A functional test platform for the Community Land Model.” Environmental Modelling & Software 55 (2014): 25–31.

Wang, D., W. Wu, T. Janjusic, Y. Xu, C. Iversen, P. Thornton, and M. Krassovski. “Scientific Functional Testing Platform for Environmental Models: An Application to Community Land Model,” Proceedings of the International Workshop on Software Engineering for High Performance Computing in Science, 37th International Conference on Software Engineering, Florence, Italy, May 16–24, 2015 (in press).

—Christie Thiessen

Oak Ridge National Laboratory is supported by the US Department of Energy’s Office of Science. The single largest supporter of basic research in the physical sciences in the United States, the Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.