OLCF Director of Science Jack Wells spoke recently to the annual Bio-IT World Conference & Expo in Boston, sharing Oak Ridge National Laboratory’s supercomputing experience.
Now in its 11th year, Bio-IT hosted over 2,500 biomedical researchers and drug development representatives from over 30 countries. The aim of the conference is to expand life science informatics and achievements.
By incorporating information technology tools into biomedical research, the Bio-IT community has grown rapidly, pushing into areas such as personalized medicine and evidence-based medicine. Wells was asked to present case studies of applications using Titan to give this community a better sense of the problems that can be tackled by leadership computing.
“It is important to bring knowledge about the leadership computing user programs to communities that are not working with us,” Wells explained. “Our participation in this meeting is an example of our outreach and the community’s interests.”
In his presentation, “Accelerating Bioscience and Technology with Titan, the World’s Fastest Supercomputer,” Wells explained that petascale computing is drastically speeding up early science results. Thanks to its hybrid architecture, which combines GPUs (originally created to accelerate computer gaming) with traditional CPUs, Titan has shown a tenfold increase in performance over its supercomputing predecessor, Jaguar.
“This new computing power is enabling new science applications,” Wells noted. “Researchers in a variety of fields need to hear about our impact so that they can dream and come up with the problems that they might want to solve on Titan.”
One powerful tool available to Titan users in biology, materials science, and nanotechnology is an application known as LAMMPS, for Large-scale Atomic/Molecular Massively Parallel Simulator. LAMMPS is a molecular dynamics code that simulates the movement of atoms through time.
Wells pointed to two studies in which LAMMPS is being used for biomedical research on Titan.
In one, Titan is helping ORNL researchers simulate the effects of sunlight on organic photovoltaic materials—materials that can generate electricity when exposed to sunlight—in hopes of creating a lightweight, highly flexible, low-cost source of renewable energy.
The other study involves the dewetting of liquid crystal films or the ability of liquid crystals to self-assemble into complex solid structures. The outcome, researchers envision, is a film-like structure that acts as a biomedical sensor adapted to detecting bacteria, antibodies, or other specific structures within the body.
Titan is also accelerating new drug discovery. Recently ORNL computational biophysicists used virtual high-throughput software to simulate the docking process of 2 million molecular compounds against a targeted cellular receptor—a feat performed in 72 hours on Titan that would have taken conventional test tubes months, or even longer. This research will eventually result in more efficient drugs, with fewer side effects, at a fraction of the current time to market. (https://www.olcf.ornl.gov/2012/10/18/big-computing-cures-big-pharma/)
Wells sees this ample opportunity for Titan to power biological research discoveries in the future.
“Thanks to the opportunities that Titan enables, we expect to grow new user partnerships within the Bio-IT community from disciplines such as genetics, drug design, and molecular biology and biophysics,” he said.—by Jeremy Rumsey]]>
Staff members from the OLCF recently made a trip to the West Coast to both attend and contribute to the world’s leading conference on graphics processing units, or GPUs.
The OLCF’s Titan supercomputer, currently ranked as the world’s fastest, uses GPUs to help crunch the massive amounts of data it processes in solving some of the world’s most pressing scientific challenges, from climate change to materials to astrophysics, to name a few.
The GPU Technology Conference (GTC) took place in San Jose, California, from March 18-21.
GTC advances global awareness of GPU computing, computer graphics, visualization, game development, mobile computing, and cloud computing, and their importance to the future of science and innovation. Through world-class education, including hundreds of hours of technical sessions, tutorials, panel discussions, and moderated roundtables, GTC brings together thought leaders from a wide range of fields, including high-performance computing.
OLCF representatives included Director of Science Jack Wells, Director of Operations Jim Rogers, and User Assistance Specialists Suzanne Parete-Koon and Fernanda Foertter.
Foertter and Wells gave a talk describing the OLCF, Titan, and the early science phase ongoing on Titan. The early science phase consists of a select group of research teams granted early access to Titan’s GPUs in an effort to maximize future users’ allocations.
“The feedback was great. We met a lot of new people who wanted to collaborate on heterogeneous computing training. We also met with users like Balint Joo and with NVIDIA collaborators and counterparts,” said Foertter, adding that these relationships will only help users gain even more from their time on Titan.
On Titan’s hybrid CPU/GPU architecture, GPUs can crunch millions of the simpler operations, freeing up the traditional CPUs to sort through the more complex math. With this unique hardware combination, Titan is capable of a peak performance of 27 petaflops, allowing researchers to simulate more complex systems in shorter timeframes, a key metric in accomplishing scientific breakthroughs.
“GTC had a number of hands-on labs featuring tutorials about how to expose parallelism in Fortran and C codes for the purpose of generating better kernels for the GPU,” said Parete-Koon. “We immediately used the knowledge we gained to develop a set of our own tutorials for seminars that we recently taught for the University of Tennessee Knoxville’s Computational Physics Class.”—by Leah Moore]]>
INCITE enables transformational advances in science and technology for computationally intensive, large-scale research projects through large allocations of computer time and supporting resources at the Argonne and Oak Ridge Leadership Computing Facility (LCF) centers, operated by the US Department of Energy (DOE) Office of Science.
INCITE seeks research enterprises for capability computing: production simulations—including ensembles—that use a large fraction of the LCF systems or require the unique LCF architectural infrastructure for high-impact projects that cannot be performed anywhere else to address some of the toughest challenges in science and engineering.
INCITE is currently soliciting proposals of research for awards of time on the 27-petaflops Cray XK7 “Titan” and the 10-petaflops IBM Blue Gene/Q “Mira” beginning Calendar Year (CY) 2014. Average awards per project for CY 2014 are expected to be on the order of tens to hundreds of millions of core-hours. Proposals may be for up to three years.
The INCITE program is open to US- and non-US-based researchers and research organizations needing large allocations of computer time, supporting resources, and data storage. Applications undergo a two-phase review process to identify projects with the greatest potential for impact and a demonstrable need for leadership-class systems to deliver solutions to grand challenges.
To submit an application, please visit http://proposals.doeleadershipcomputing.org for details about the proposal requirements. Applications will be accepted only electronically starting April 15, 2013. Proposals will be accepted until a call deadline of 11:59 p.m. EDT on Friday, June 28, 2013. Awards are expected to be announced in November 2013.
The Argonne and Oak Ridge Leadership Computing Facilities will be hosting two proposal writing webinars on April 25th and May 14th. Visit https://www.olcf.ornl.gov/training-event/incite-proposal-writing-webinar to register.
For more information on the INCITE program and a list of previous awards, visit http://www.doeleadershipcomputing.org/.]]>
The Oak Ridge Leadership Computing Facility’s (OLCF’s) High-Performance Computing Operations team leader, Don Maxwell, received a lifetime achievement award from Adaptive Computing as part of their first annual Adaptie Awards.
The provider of HPC workload management software acknowledged Maxwell’s contributions to the greater HPC industry and to the advancement of their management software Moab, which is now widely used on Cray supercomputing platforms.
Maxwell began his career with the Department of Energy more than 25 years ago at the Y-12 Security Complex in Oak Ridge, Tennessee before joining Oak Ridge National Laboratory in 2000. He was integral to the transition from IBM systems to the Cray XT series, including Jaguar, which ranked as the world’s most powerful supercomputer in November 2009 on the Top500 list, and Jaguar’s successor Titan, which currently holds the number one spot.
To assist users competing for coveted time on Jaguar, Maxwell and the OLCF Operations team worked with Cray and Adaptive Computing in 2006 to port the Moab management system to Jaguar’s Cray platform. As Jaguar evolved over the years, so did Maxwell’s work with Moab.
“Life without Moab would be very difficult,” Maxwell said. “Moab provides the ‘job scheduling’ function for the supercomputer. It allows users to submit jobs to run on the machine at some point in the future without further intervention, and it allows the OLCF to prioritize larger jobs that are crucial to our mission for the Department of Energy.”
Despite his high honors from Adaptive Computing, Maxwell said their relationship is not over, and they will stay in close collaboration as Moab is repurposed for Titan.
“We have some things to work on going forward that I think are exciting for the product,” Maxwell said in his acceptance speech.
His speech was delivered via video during MoabCon 2013, Adaptive Computing’s annual user conference in Park City, Utah. Maxwell stayed behind at ORNL, committed to work on Titan that will define the next chapter in his celebrated career. —Katie Elyce Freeman]]>
ORNL team analyzes frustrated matter in stellar explosions
The collapsing iron core of a giant star is a violent and confusing scene.
Iron cannot fuel the nuclear fusion that has kept the star going for millions of years, so no energy is available to support the core against its own gravity. This is the event that triggers the star’s death in a stunning explosion.
The collapse is over in a few thousandths of a second. In it, the innermost core, a portion half the mass of our sun, falls at up to 70 percent the speed of light until it becomes so dense that a teaspoon would weigh more than 300 Empire State Buildings.
Then the core bounces—and what a bounce it is. The resulting shock wave takes the star, which may be up to 40 times as massive as the sun, and blows it into space, leaving behind a new-born neutron star or a black hole.
This is a core-collapse supernova, one of the universe’s most impressive calamities, and one of its most valuable. Stars are the universe’s element factories and core-collapse supernovas are the distribution channels. We can thank them for most of the elements in our environment, and in our bodies.
A battle of forces
The supernova not only illustrates all of nature’s forces—from the gravity that extends across the universe to the nuclear force that binds and arranges the smallest building blocks of matter. It also illustrates what happens when these forces battle one another for dominance.
The University of Tennessee-Knoxville’s (UTK’s) Jirina Stone and Helena Pais have adopted the most detailed and accurate model to date of one such battle. Using the lab’s Jaguar supercomputer, they simulated matter at the core’s bounce, when the shock wave starts to develop. The core is formed by a thick soup of neutrons, protons, and electrons, and up to 20 percent of the matter is believed to form into cylinders, sheets, bubbles and other odd forms.
This is nuclear pasta—a form of frustrated matter—a rare ordering found within the supernova and in the neutron star that will be all that remains when it is over. For a split second as the star blows outward there will be a layer of these strange forms in the core about 100 kilometers from the center.
The forces battling for supremacy of nuclear pasta are the Coulomb force and surface tension. Coulomb energy refers to the attraction or repulsion of electrically charged particles (think magnets); in this case it is pushing positively charged protons apart. Surface tension, on the other hand, is the internal pressures that pull liquids into the smallest possible area (think water droplets); in this case it is enticing the neutrons and protons to form coherent structures.
Stone and Pais published their results in the October 12, 2012, issue of Physical Review Letters. The work also plays an important role in Pais’s doctoral thesis from UTK.
“Imagine what happens,” said Stone. “When the neutrons and protons are close enough that the strength of the coulomb force and nuclear forces are comparable, the coulomb force is telling all the nucleons to get apart, while the surface forces are saying, ‘Come on, I want to keep you together.’ Because the Coulomb interaction is just repulsive, but the nuclear interaction is both repulsive and attractive, there is a competition between these controversial effects. One is saying, go apart, and the other is saying, no stay here.”
As a result, this layer of the exploding star is dominated by odd shapes, resembling penne, lasagna, and bowties, as well as spheres and other not-quite-pasta shapes. Below the pasta is a very dense liquid of mainly neutrons. Above it is a particle gas of free neutrons, heavy nuclei and electrons.
30 million processor hours
The team used an approach called Skyrme-Hartree-Fock, with Pais handling the calculations. All told, the project used 30 million processor hours on Jaguar, with each 12-hour run making use of 45,000 processor cores. The adaptation of the original code for use in this mode was provided by Reuben Budiardja of UTK’s National Institute for Computational Sciences, while ORNL’s Eric Lingerfelt helped implement the VisIt visualization package.
“‘Hartree-Fock’ is the mathematical method that makes a description of systems made of many quantum particles possible to the best approximation,” Stone explained, “as exact treatment of such a system is currently beyond reach of the fastest computers in the world. The Skyrme interaction is a prescription of how nucleons talk to each other.”
The model, called “3D-SHF-EOS,” was developed by Will Newton, former graduate student of Stone’s at England’s Oxford University and now an assistant professor at Texas A&M–Commerce. It assumes that all matter under the same temperature and density will behave the same, so it can be described using representative cells.
“What we do in the model is say that the matter in the star can be modeled by a sequence of cubic cells—little cubes—which are connected to each other through something we call ‘periodic boundary conditions,’” Stone explained. “This means you go from one cell to another cell to another cell, and you just reproduce the situation in one cell to the other and so forth.”
Each cell is a tiny cube, 25 quadrillionths of a meter on a side. By modeling a range of these cubes and calculating what happens to the matter under different temperature and density conditions, the project provides all the information needed by a supernova simulation to accurately describe the pasta phase of nuclear matter.
“People before us suspected there was pasta,” Stone said. “People would assume a certain formation, like the sheet or a rod, and they would calculate the distribution in the cell and look at whether the energy of that cell is lower or higher than the energy of the uniform matter (no pasta), which they could also calculate.
“What we do is assume absolutely nothing. We evolve shapes. This is our main contribution, because we evolve these formations without actually any assumptions about what should evolve under what conditions. Moreover, we actually show the onset of the pasta and the dissolution of pasta and can study the conditions at which the pasta appears and disappears.”
The confirmation that nuclear pasta exists in the stellar environment adds to observations of frustrated matter in other areas—such as solid state physics, magnetism and biology—that also exhibit strange orderings of matter under the influence of conflicting forces.
In addition, Stone’s and Pais’s work has at least two purposes beyond the fascination of the pasta itself. First, it is a valuable input into simulations of the exploding star itself. Second, it helps contributes to explanation of the cooling of the neutron star that is left behind.
A matter of neutrinos
Each of these has to do with neutrinos. Neutrinos are tiny, elementary, usually inconsequential particles that normally go on their way without interacting with other matter. They have no electrical charge. Their mass is very small. Billions go through your body each second and do nothing along the way.
In the collapsed core of a supernova, however, neutrinos are a major byproduct of the collapse. They are so numerous and the matter around them so dense that the neutrinos power the explosion that blasts the star into space. Along the way, they cool the remnant neutron star by taking energy away from the collapsed core.
Exactly how the neutrinos do their thing depends on the makeup of the matter they’re blasting through. This is where the model developed by Stone and Pais may be especially valuable.
“Neutrinos play a big role in the core-collapse supernova,” Stone said. “The density at the bounce is such that the neutrinos for some fraction of a millisecond cannot get out. So, in principle, at first the neutrinos are trapped in the matter, but when the bounce releases the matter a little bit, the neutrinos will be able to pass through the core to propagate the shock. They carry a lot of energy.
“At that point it matters what the composition of the star is, because the neutrinos behave differently if they pass through homogeneous matter or if they pass through these complicated structures. And this may affect the shock.”
The output of the project is information that will refine the input conditions for supernova simulations currently used.
“We will give them the numbers which they need,” Stone noted. “Pressure, chemical potentials, all the numbers they need. So this calculation has been done once forever, if you like.”
Pais, Helena, and Jirina R. Stone. “Exploring the Nuclear Pasta Phase in Core-Collapse Supernova Matter.” Physical Review Letters 109.15 (2012): 151101.]]>
Prospective Titan users gathered in Knoxville, Tennessee, February 19-21, for the East Coast Titan Users and Developers Workshop and Users Meeting. Hosted by the Oak Ridge Leadership Computing Facility (OLCF), the workshop mirrors a January event held in California. In total 65 attendees were on-site while 51 logged in to participate remotely.
Attendees picked up skills for running on Titan, from parallelization techniques to debugging. The first day of the workshop was a users’ meeting, which introduced users to the OLCF and Titan basics. The rest of the workshop focused on preparing users to work on Titan’s CPU-GPU architecture. A live webcast was provided for those unable to attend.
Long-time OLCF user Stephane Ethier, a computational physicist from Princeton Plasma Physics Laboratory, said of his experience with the live webcast, “The presentations were excellent and very useful for all levels of users. Introductions to the various methods of programming GPUs were especially relevant.”
New this year to the training workshops was a focus on hands-on activities. Attendees reported via feedback form that they found the activities helpful in preparing them to work at the OLCF. The Titan training team plans to incorporate more activities in future workshops and to post in-depth tutorials for users to view online.
A popular topic at the workshop was CUDA, a parallel computing platform and programming model created by NVIDIA that runs on the GPUs in Titan. There was a lecture on the subject as well as a hands-on lab featuring CUDA.
Users reported that they left the workshop prepared to work on Titan in the future and with greater interest in the topics covered. —by Leah Moore]]>
The next era in high-performance computing is here. On Monday, March 11, researchers from a wide spectrum of scientific disciplines were granted access to the Titan supercomputer’s graphics processing units (GPUs).
Managed by Oak Ridge National Laboratory’s Leadership Computing Facility (OLCF), Titan is currently the world’s fastest supercomputer and is unique among its peers due to its hybrid architecture—a combination of GPUs, traditionally used in video games, and the more conventional central processing units (CPUs) that have served as number crunchers in computers for decades. The complimentary combination of CPUs and GPUs will allow Titan to perform more computing operations using less power than previous generations of high-performance computers.
On Monday morning, researchers were able to take advantage of 8,972 GPU-enabled compute nodes. This critical step marks the beginning of a new role for GPUs—helping computational science to tackle some of our most complex challenges in energy, climate, biochemistry, and fundamental physics, to name a few.
While Titan has not yet completed the full suite of acceptance tests (a series of agreed upon benchmarks that demonstrates the machine’s readiness for use), it has successfully passed both the functionality and performance phases of the acceptance test suite.
The project schedule calls for completing acceptance testing by June of 2013, a schedule the OLCF expects to meet.
Titan has a theoretical peak performance of more than 27 petaflops, or more than 27,000 trillion calculations per second, enabling researchers to achieve unparalleled accuracy in their simulations and achieve research breakthroughs more rapidly than ever before.
Because they handle hundreds of calculations simultaneously, GPUs can go through many more than CPUs in a given time. Yet they draw only modestly more electricity. By relying on its 299,008 CPU cores to guide simulations and allowing its Tesla K20X GPUs, which are based on NVIDIA’s Kepler architecture to do the heavy lifting, Titan is approximately ten times more powerful than its predecessor, Jaguar, while occupying the same space and drawing essentially the same level of power.
In order to achieve the goal of exascale computing, a game-changing technology is necessary. Many in the supercomputing arena believe that GPUs are exactly that, and Titan is the proving ground for this technology.
“On the first day of having access to the GPUs on Titan, users of the GPU-enabled nodes consumed 500,000 Titan core-hours,” said OLCF’s Director of Science Jack Wells.]]>
When in November 2012 the Oak Ridge Leadership Computing Facility (OLCF) stood up its latest and greatest machine, Titan, it seemed to many that the hard work was done.
Not so. While launching a machine on the scale of Titan, now ranked as the world’s most powerful computer, is certainly an achievement in itself, at the end of the day the hardware is only as good as the software that runs on it.
Because Titan is among the first supercomputing systems to use a hybrid architecture—one that combines traditional central processing units (CPUs) with the graphics processing units (GPUs) common in video game systems—getting scientific applications to scale to all of Titan’s nearly 300,000 compute cores is no small feat. And no matter how solid any one application is, that scaling up is certain to introduce bugs that can greatly hamper its (and Titan’s) productivity.
When an application containing hundreds of thousands of lines of code is running across 300,000 cores, spotting such bugs is a tricky business. It’s a case of sheer numbers and one the OLCF anticipated. It knew it would have to create a revolutionary tool to allow applications to run smoothly after scaling to Titan.
For that reason OLCF staff began working with software developer Allinea on Titan’s previous incarnation, known as Jaguar, and in preparation for Titan’s launch. The product of this relationship is Allinea’s distributed debugging tool, or Allinea DDT, created precisely for the world’s leadership computing systems. With the assistance of OLCF staff, Allinea was able to customize its large-scale debugger to Titan’s hybrid architecture, enabling the supercomputer’s first users to easily scale to large portions of the machine and assisting the OLCF during Titan’s critical acceptance phase.
“Part of the mission of the Titan project is to provide a comprehensive programming ecosystem that allows researchers to be as productive as possible,” said Joshua S. Ladd, the tools project technical officer during the OLCF3 Project. “A major component of that ecosystem is the debugger.”
Currently Allinea representatives are working with Oak Ridge National Laboratory’s (ORNL’s) Application Performance Tools Group to extend Allinea DDT to a scale 40-plus times greater than previous high-end debugging tools, and they are making serious headway.
“Before we joined this project, tools weren’t capable of getting anywhere near the size of the hardware,” noted Allinea’s COO David Lecomber. “The problem was that a debugging tool might do 5,000 or 10,000 parallel tasks if it was lucky, when the machines and applications wanted to write things that could do 200,000 plus. So the tools just got beaten up by the hardware.”
With Allinea DDT, however, the times, they are a-changin’.
What’s in a debugger?
A supercomputer needs a super debugger. Supercomputing applications typically assign each process to a single, separate processor, meaning an application running on 200,000 processors will most likely be executing 200,000 simultaneous processes.
Traditionally a developer will contend with bugs by inserting “print” statements at strategic points in the code. These statements tell the application to display the status of each process at that point in the program’s execution—information such as the value of a variable. By running a test problem, the developer can compare each answer with an expected answer and thereby isolate specific problems in the code.
Each process will respond to the print statement with a one-line answer; thus, an application with thousands of processes will display thousands of lines through which the coder must then sift. This method gets more difficult as the number of processes grows, and it becomes impossible beyond a certain point.
With Allinea DDT, though, developers can quickly pinpoint any failures because it gives them a single view of every process in a parallel job, along with exactly what line of code is being executed. Furthermore, the debugger works with applications written in the most common supercomputing languages: Fortran, C, and C++.
“Allinea DDT is tightly integrated into the Cray programming environment. We worked with Allinea Software to ensure that,” said Ladd. “All you really need to do is load the Allinea DDT module and type ‘ddt’ on the command line to fire up the GUI, and you’re ready to go. And the GUI is just point and click with a mouse.”
With this revolutionary tool, researchers can now more easily focus on their scientific goals without worrying about locating bugs across hundreds of thousands lines of code, and the results are beginning to show.
Accelerating science one bug at a time
With both Titan and Allinea DDT, supercomputing is in uncharted territory.
“The combination of Titan’s size and hybrid architecture with GPUs provides Allinea with a formidable testing ground to grow, develop, and refine the Allinea DDT tool,” said the OLCF’s Hai Ah Nam, a staff scientist that works with researchers to help them get the most out of their time on Titan.
It’s a symbiotic relationship: “During [Titan’s] acceptance, we used Allinea DDT to help us find bugs in our codes. But as early users of Allinea DDT on Titan, we in turn found bugs in Allinea DDT that could be fixed before release to the larger user community,” said Nam. Allinea DDT is really paying off because it represents the only tool with its scaling capacity for Titan’s hybrid CPU/GPU architecture, she said. And Nam should know. She frequently proves Allinea DDT’s value in her role as a scientific computing liaison for the INCITE program.
Her most recent project involved an application, Bigstick, intended to describe the properties of various atomic nuclei of different substances. This basic science research contributes to many fields, including energy and medical research. Although Nam had a good understanding of the algorithm, she had only a few days to work with the researcher.
“He was having trouble with this ‘Heisenbug’ (Heisenbugs are bugs that mysteriously vanish whenever you try to ‘observe’ them, typically with a ‘printf,’ because you’ve altered latencies between interprocessor communications) that only showed up when he scaled to a certain number of processors,” she said. “And it happened sporadically.”
By the time the researcher got through his first print statement and looked at one part of his code, Nam had figured out the problem with Allinea DDT. “I stunned him by finding his problem so quickly,” she said. “I was able to do it in one sitting, about an hour. I suspect it would have taken him at least a couple of weeks.”
Nam can empathize with scientists who want to focus on their research rather than learning yet another software application. “At small scales I could printf myself out of a problem,” she admits. “But now, with codes running on tens to hundreds of thousands of cores, I have to use Allinea DDT if I want to solve the problem quickly.”
The beginning of a beautiful relationship
“It was a big deal when Allinea Software came here in 2009, and they were able to start Allinea DDT on all of Jaguar’s 225,000 cores,” Ladd said.
Since its inception on Jaguar, the distinguished debugger has helped the OLCF solve some unusual challenges. Ladd and his team used the program to debug an open-source implementation of the Message Passing Interface (MPI) middleware. The work was at a very large scale, a half-million lines of code running on 100,000 to 225,000 cores.
“Even your typical nuclear supernova application is not that size,” said Ladd. “Debugging inside MPI is a vast universe of complexity that touches all aspects of a supercomputer—the network, the CPU, and the memory. All of these factors can conspire to cause problems at scale.
“By having the ability to step through the code, we could identify and resolve issues that I don’t think we would have been able to without Allinea DDT.”
Debugging also gets tricky when code has errors but still runs. To address this problem, Allinea Software is collaborating with VisIt—open-source software used to visualize large scientific data sets. A visual inspection enables researchers to look at a picture of the data, click on different cells, and inspect the process generating the data.
“So let’s say the output is a video of a star exploding,” said Ladd. “As that star explodes, if there are all kinds of weird asymmetries, you probably have some bug in your math. With a visualized debugging tool, if it doesn’t look like you expected, you go through the process to determine if you’ve got a bug in your code, or if you’ve discovered something new.”
Overall, Ladd describes the working relationship with Allinea Software as a gratifying partnership: “I think it has been rewarding for the Allinea Software folks to see their baby running at this scale, and it’s been rewarding for us to have it as a productive contributor in our tools suite . . . Titan is really cutting-edge technology, and it’s even more exciting because it’s not immediately clear what kind of issues users are going to run into when porting their code to the GPUs. To help encourage researchers to use the GPU accelerators, they must have the most powerful and effective tools at their disposal, tools like Allinea DDT. We’re excited for users to run into bugs on the GPUs to see this tool in action.”
By creating the fastest supercomputer with the best “supertools” to support it, the OLCF has created a solid launch pad for breakthrough discoveries.
Just ask ORNL’s Markus Eisenbach. He works with an application known as WL-LSMS, which provides first-principles calculations of properties that are important for the understanding of materials such as steels, iron-nickel alloys, and advanced permanent magnets that will help drive future electric motors and generators. Titan is helping Eisenbach’s team improve the calculations of a material’s magnetic states as they vary by temperature.
However, as with many scientific endeavors, the task is easier said than done, especially when transferring the research from a traditional CPU-based system to Titan’s revolutionary hybrid platform.
Thanks to Allinea DDT, however, the transition has been smoother than previously thought possible. When Eisenbach’s team began its work on Titan, a funny thing happened.
Whenever the team’s application scaled to roughly 14,000 cores, or about two-thirds of the machine, the GPU-based version of WL-LSMS mysteriously crashed. The traditional, CPU-based version, on the other hand, ran fine.
“It was puzzling,” said Eisenbach, noting that the different versions weren’t noticeably different when it came to scaling. Because at such a scale researchers like Eisenbach can’t decipher all of the data in an application’s core file, it was difficult to pinpoint the error in the code.
While Allinea DDT didn’t precisely pinpoint the source of the bug, it did help Eisenbach’s team narrow the range of possibilities to a very small region of the code and “inspired [its] intuition” of what was happening. Turns out the code was running out of Open MP stack space. “Allinea DDT saved a tremendous amount of time,” said Eisenbach.
While stories like Eisenbach’s and Nam’s are indeed a testament to the OLCF/Allinea partnership, they are surely just the beginning. As more and more researchers begin scaling their codes to Titan’s revolutionary architecture, Allinea DDT will have plenty of work to do.
The Oak Ridge Leadership Computing Facility (OLCF) took high-performance computing training to the user for the first time when experts from Oak Ridge National Laboratory (ORNL) traveled to California for the Titan Users and Developers Workshop.
The conference was held in the San Francisco Bay area from January 29-31, where the OLCF has many current and prospective users. The goal of the three-day event was to provide hands-on training to prepare users and developers to work on ORNL’s Titan system, which gets its leadership power by combining GPUs with CPUs.
The workshop moved from introductory topics to more advanced material. Day 1 focused on basics like properly logging into the system, compiling data, running applications, and using scientific libraries. By the third day, users and developers were learning more complex material such as GPU architecture and application programming.
Attendees participated in lab activities to practice what they were taught and get experience with applications and programming techniques.
“The hands on activities gave the users a chance to get involved with the material they were learning, which will help them when they do their own work on Titan,” said Ashley Barker, user assistance and outreach group leader for the OLCF.
The conference also featured presentations by the OLCF’s Michael Brown and Wayne Joubert who worked on preparing codes for Titan. They were able to share lessons they had learned in the programming process so that new developers could be more familiarized with the system and know what pitfalls to avoid.
“By the end of the workshop, potential Titan users gained a great starting point for their projects.” Barker said. “They understand Titan’s architecture and the programming environment, which gives them a boost starting out.”—by Leah Moore]]>
While much is made of today’s melting Arctic ice cap, the phenomenon is not a new one.
In fact, about 22,000 years ago the Earth’s great ice sheets began to decline, slowly at first, but gradually more rapidly. Given the growing concerns about today’s shrinking glaciers and polar ice caps, scientists are very interested in knowing what happened the last time the Earth shed much of its ice. A better understanding of natural climate change should greatly assist in advancing our understanding of man-made climate change.
Researchers agree that a rapid, natural release of CO2 about 17,000 years ago led to a rise in global temperatures which further encouraged deglaciation in both the Northern and Southern Hemispheres, though at very different rates. What started the ball rolling 22,000 years ago, however, was until recently still a mystery.
Now researchers from the University of Wisconsin-Madison (UW-Madison), Harvard University, Oregon State University, and the National Center for Atmospheric Research (NCAR) have discovered the trigger for the beginning of the last great deglaciation. The team ran transient, or continuous, simulations on Oak Ridge National Laboratory’s (ORNL) Jaguar supercomputer over three years to create the first physics-based test of hemispheric deglaciation. Their culprit: a combination of increase in insolation (solar radiation that reaches the earth’s surface) caused by changes in the Earth’s orbit, and ocean circulation.
The simulations, conducted by Feng He and Zhengyu Liu of UW-Madison and Bette Otto-Bliesner of NCAR, help to recreate the climate during the first half of the last deglaciation period and identify why temperatures and deglaciation rates differed between the hemispheres. The research builds on earlier simulations performed at ORNL and featured in Science in 2009 and Nature in 2012. Their latest finding detailing ocean circulation as the primary cause of early deglacial warming in the Southern Hemisphere appears in the February 7 issue of Nature.
The research is part of a larger initiative that has succeeded in obtaining a mean global temperature for the past 21,000 years, giving scientists a much-needed tool with which to compare carbon dioxide levels and temperatures across the world. Jaguar, managed by the Oak Ridge Leadership Computing Facility (OLCF), has since transitioned to Titan, which is currently recognized as the fastest computer in the world. And results from the simulations are shedding much-needed light on our current climate conundrum.
And quite possibly our past. The team also believes that this same mechanism may apply to all of the major deglaciations for the last 450,000 years.
Two Poles, One current
The Milankovitch theory states that the Earth’s orbit around the sun is responsible for the growth and deterioration of glaciers because it changes insolation reaching and warming a given area. Starting about 21,000 years ago changes in the orbit of the Earth produced a warming of the summers in the Northern Hemisphere, leading to a general planet-wide warming.
Geologic data show that about 19,000 years ago, Northern Hemisphere glaciers began to melt, and sea levels rose. Melting glaciers dumped so much freshwater into the ocean that it slowed a system of currents known as the Atlantic meridional overturning circulation (AMOC) that transports heat throughout the world. This ocean conveyor belt is particularly important in the Atlantic where it flows northward across the equator, stealing Southern Hemisphere heat and exporting it to the Northern Hemisphere. The AMOC then sinks in the North Atlantic and returns southward in the deep ocean. A large pulse of glacial meltwater, however, can place a freshwater lid over the North Atlantic and halt this sinking, backing up the entire conveyor belt.
The team’s simulations showed a weakening of the AMOC due to the increase in glacial melt beginning about 19,000 years ago, which decreased ocean heat transport, keeping heat in the Southern Hemisphere and cooling the Northern Hemisphere. Essentially, when the AMOC broke down, the south heated up as the north cooled down, a phenomenon known to climatologists as “the bipolar seesaw.”
This, in turn, led to an enormous release of CO2 from primarily beneath the ocean, which then greatly accelerated the warming of the globe and, by extension, deglaciation. “When CO2 came out, everything changed,” said He, referring to the uncertain events preceding rising CO2 levels as the “Holy Grail of glacial theory.”
Essentially, said He, the timeline for the Earth’s last deglaciation is as follows: from 22,000 to 19,000 years ago, Northern Hemisphere insolation triggered its gradual warming as a result of the large increase in high-latitude spring–summer insolation and strong sensitivity of the land-dominated northern high latitudes to insolation forcing; from 19,000 to 17,000 years ago the AMOC phenomenon described in the team’s latest paper primarily accounts for early Southern Hemisphere warming and deglaciation; and the rise in CO2 starting around 17,000 years ago brought about the final, most drastic stages.
A little help goes a long way
To date the simulations have consumed more than 14 million processor hours on Jaguar. The Innovative and Novel Computational Impact on Theory and Experiment program, jointly managed by leadership computing facilities at Argonne and Oak Ridge National Laboratories, awarded the allocations.
The team’s weapon of choice in deciphering the climate of the last 20,000-plus years: the Community Climate System Model, a climate model that includes fully coupled atmospheric, land, ocean, and sea ice component models. The CCSM, first introduced in 1996, has as its origin the Community Climate Model, created by NCAR in 1983 as a freely available global atmospheric model for use by the wider climate research community.
Since then CCSM has evolved into the world’s leading tool for climate simulation, regularly contributing to the United Nations’ Intergovernmental Panel on Climate Change reports. “The simulation reproduces the Southern Hemisphere proxy records beautifully. A good model is the result of many people’s efforts,” said He.
The OLCF has given the project nearly 4 continuous years of access, allowing the team to run climate simulations over 22,000 years and produce nearly 300 terabytes of data. “We have the resources to stage all data online for analysis,” said the OLCF’s Valentine Anantharaj, who worked with the team to make sure they got the most from their time on Jaguar. Anantharaj now works with users on the ten-fold more powerful Titan system, and according to him the OLCF represents a valuable end-to-end resource capability: “Our facility supports a scientific workflow that enables our users to run their simulations, do their analyses, and visualize and archive the results.”
The combination of a talented team of researchers, a refined CCSM climate model, and a center with the expertise and capabilities of the OLCF is beginning to leave it’s footprint across thousands of years, from the beginning of the last deglaciation 22,000 years ago to our decisions today that will affect our climate future for years to come.
The OLCF has previously highlighted the team’s research related to the 2009 Science paper and the 2012 Nature paper]]>