In 2017, the Oak Ridge Leadership Computing Facility celebrated 25 years of leadership in high-performance computing. This article is part of a series summarizing a dozen significant contributions to science enabled by OLCF resources. The full report is available here.
Metals and alloys are ubiquitous, so we may barely notice them. However, new discoveries and optimizations of these materials are essential to modern life, contributing to energy, transportation, medical, manufacturing, and information technologies.
With the introduction of massively parallel computing at ORNL, a team led by Materials Theory Group leader Malcolm Stocks saw an opportunity to greatly increase our understanding of material properties. Over the last quarter-century, the team has leveraged supercomputing to improve predictive materials modeling, enabling them to model larger systems of many thousands of atoms, disordered structures that are difficult to predict yet important to performance, and the finite temperature properties of magnets.
The Science
When CCS was founded at ORNL in 1992, scientists were already using theory and computation to predict the properties of materials based on their atomic and electronic structure, but their calculations were limited by computational power. Researchers could compute only a small number of atoms at a time and were restricted to materials with neatly ordered structure, making it difficult to model the range of metals and alloys used for industry and technology applications.
One of the “Grand Challenges” recognized by the teams of researchers who wrote the 1991 proposal that led to increased funding for HPC and the creation of CCS was the need to simulate increasingly complex materials from first principles. Based on the fundamental laws of quantum mechanics that govern the behavior of electrons and atomic nuclei, first-principles calculations require an immense amount of computing power. However, physicists and materials scientists needed more than just a bigger, faster computer to advance materials structure modeling. The ORNL team quickly realized that the predictive methods that had been developed to compute the underlying electronic structure of metals and alloys did not complement parallelism. At every increase in system size—from tens to hundreds or hundreds to thousands of atoms—they knew the computational effort required would balloon. The parallelism actually worked against the computational methods developed to that point. So ORNL researchers took a different approach.
Using real space multiple scattering theory, which describes the propagation of electrons through solids, ORNL researchers developed the Locally Self-consistent Multiple Scattering (LSMS) electronic structure code specifically for large numbers of parallel processors. The LSMS code solves the runaway problem of calculating the electronic structure for large numbers of atoms by assigning each individual atom to a single computer node. The effect of neighboring atoms on each atom is then computed on each node, capturing the intent of parallelism. Computing local electronic structure in parallel reduced the computational burden while increasing system sizes and maintaining accuracy.
“In order to do these kinds of models, you need a first-principles electronic structure method that will scale to a few thousand atoms, minimum. LSMS has that capability.” —Malcolm Stocks, Oak Ridge National Laboratory
The Legacy
First presented in a Physical Review Letters article in 1995 based on computations carried out on the 1,024-node Intel MP Paragon XP/S-150 MPP supercomputer, LSMS was used to calculate the total energy of copper by modeling as many as 1,024 atoms—a system size previously inaccessible to first-principles methods. The ORNL team demonstrated not only that the code could produce accurate energies but also that the computing time scaled linearly with system size. Subsequent calculations that exploited the full power of the XP/S-150 produced insights into many materials systems, including disordered alloys, metallic magnets, and magnetic interfaces.
ORNL researchers continue to develop LSMS for ever more powerful supercomputer architectures. In 2000, the code was the first science application to perform at over one teraflop on a Cray T3E outside the laboratory. On the OLCF’s Cray XT5 Jaguar in 2008, LSMS became the second application to run at over one petaflop and was later adapted for GPUs in preparation for the 27-petaflop, hybrid CPU–GPU Titan supercomputer that came online in 2012. Team members working on LSMS development have received two ACM Gordon Bell Prizes for peak performance in parallel computing: first in 1998 and again in 2009, when the application was combined with advanced statistical mechanics techniques to simulate the stability of alloys and magnets at finite temperature.
By leading the way into new territory time and again, LSMS has revealed the behavior of magnetic and electronic systems at unprecedented scales and provided an accurate framework for materials models beyond the HPC community and into mainstream modeling for research and industry.
Related Publication: Wang, Y.; et al. (1995), Order-N Multiple Scattering Approach to Electronic Structure Calculations, Physical Review Letters, Volume: 75.