The “Pioneering Frontier” series features stories profiling the many talented ORNL employees behind the construction and operation of the OLCF’s incoming exascale supercomputer, Frontier. The HPE Cray system is scheduled for delivery in 2021, with full user operations in 2022.
Whenever David Grant wakes up in the middle of the night to ask himself whether he remembered to do that one important detail at work, it can be nearly anything. From correctly setting the alarm threshold on a pressure sensor to ensuring that construction contracts precisely convey design specifications, Grant has many different tasks on his mind each day.
As the high-performance computing (HPC) lead engineer in the Laboratory Modernization Division (LMD) at the US Department of Energy’s (DOE’s) Oak Ridge National Laboratory (ORNL), Grant is responsible for making sure every new supercomputer system that’s installed has the cooling it requires to reliably operate 24/7. And with the Oak Ridge Leadership Computing Facility’s (OLCF’s) upcoming Frontier—the nation’s first exascale-class supercomputer, capable of a billion billion floating point operations per second—Grant must check and recheck many important design details before its power switch is flipped on in late 2021.
“Our mechanical systems help enable the missions that are going on in the ORNL data centers. Not wanting there to be any mistakes makes me look repeatedly at the same thing from multiple angles, multiple times over the course of a year,” Grant said. “I’m constantly trying to turn over rocks and see what could go wrong and what could be improved or made better.”
Aside from occasionally sacrificing a few hours of sleep, Grant says he feels fortunate to be in his position: overseeing the design and construction of new mechanical systems and assisting with the technical aspects in the operation of existing mechanical systems. He must not only make sure that these systems efficiently and reliably meet the needs of each supercomputer but also that they’re built within budget and on time.
“My view is that I’m an owner’s representative trying to get the best value for the owner, who is ultimately the American taxpayer,” Grant said. “So I’m there creating and steering designs—looking for ways to add the most value to all the different aspects that are under the mechanical umbrella.”
That umbrella covers a lot of different machinery, but it primarily consists of all the cooling infrastructure that will allow a supercomputer to function. To have that infrastructure in place when the supercomputer arrives, he must design it years before the supercomputer physically exists. Currently being built by HPE Cray, Frontier will feature Cray’s new EX architecture with high-performance AMD EPYC™ CPU and AMD Radeon Instinct GPU technology. The cooling system has no fans in the computer itself but instead relies on heat exchangers and pumps that circulate thousands of gallons of water per minute. The design allows the use of much warmer water to cool the machine than on previous supercomputers, resulting in lower operating costs.
Although the design team started drawing up Frontier’s data center plans in late 2017, the first cabinets are not scheduled to arrive at ORNL until this summer. So how did they spec out infrastructure designs that far ahead? Lots of meetings with a broad set of experts.
Initially, a series of “scoping” meetings was held with leaders from the OLCF—a DOE Office of Science User Facility at ORNL—who are in charge of procuring the new system and setting its mission goals. Representatives from HPE Cray also attended the meetings, after being awarded the contract, to share their design specifications for the future machine.
With that information in hand, an integrated design team was assembled, including ORNL LMD electrical engineers and mechanical engineers, ORNL Mechanical and Medium Voltage Utilities’ staff and engineers, outside electrical and mechanical design/build contractors, an architect, and a third-party commissioning agent. The entire team collaborated to determine what would be needed to make Frontier operational, but Grant keeps his eye on the big picture, guiding both design and construction to ensure that the technical aspects of the mechanical systems remain sound. This method pays off when big things happen late in a project—such as having the required flow rate of water to the computer increase by 40 percent, as happened in this project.
“When I’m putting my design hat on, I’m looking outside of ORNL at the other DOE labs and other HPC centers to see what they’re doing and bring back best practices,” Grant said. “Within ORNL, I’m looking at how to write construction scoping documents, contractual documents to our contractors to make sure that the intent of the designs are effectively communicated. I also consult with Mechanical Utilities and Craft to see what has been done well and what could be done better, as they have a deep well of hands-on knowledge about operating and maintaining the infrastructure. Then I’m watching the contractors during construction to make sure that the things that they’re doing meet the intent of the scoping documents and will ultimately provide what we need for the mission.”
With that much on his to-do list, it’s a wonder that Grant doesn’t lose even more sleep. Fortunately, his wife and two small children help keep him busy with things to do outside of his job. And, since he’s been working from home about 50 percent of the time during the pandemic, he’s been able to spend more time with them.
But Grant has discovered one significant drawback to working out of his own house: as a mechanical engineer, he’s constantly aware of all the home-improvement issues he ought to be tackling. The fact that their house is a 100-year-old Craftsman bungalow means it’s been an ongoing project ever since he and his wife bought it in 2006.
“It was originally a duplex house, two units, and we turned it back into a single-family home. We’ve basically touched every square inch of the house, turned it back to the way it originally was—it’s got the big 6-foot-wide hallway all the way down the middle with almost 10-foot ceilings. So it was a big project at the time but we got through it. I wouldn’t do it again.”
Well, except for those 50-pound solar panels he’s been installing on the roof—for which he built a trolley and pulley system onto his extension ladder—as well as the new insulation he’s been putting in. Conserving energy has long been a theme in his professional life, too, even at his first job out of college at a start-up company called IdleAire Technologies, which produced electrification units for truck stops that allowed truck drivers to turn off their engines while resting to save fuel. Once it started facing business difficulties, Grant was laid off in 2008, which was when he saw the job listing for ORNL.
“I knew I wanted to be in energy efficiency. Where that was going to be—where I was going to end up—I didn’t know. But I feel very fortunate that I landed where I did and have enjoyed it very much,” Grant said.
UT-Battelle LLC manages Oak Ridge National Laboratory for DOE’s Office of Science, the single largest supporter of basic research in the physical sciences in the United States. DOE’s Office of Science is working to address some of the most pressing challenges of our time. For more information, visit https://energy.gov/science.