- The Titan supercomputer at Oak Ridge National Laboratory in Tennessee debuted as the fastest computer in the world in 2012 and remained in the top 10 for 7 years, providing billions of core-hours of total computing time to researchers from around the globe. The system will be decommissioned on August 1 and its data center space retrofitted for a new supercomputer.
- NVIDIA Kepler GPUs are connected to Jaguar’s AMD 16-core Opteron processors. A total of 18,688 GPUs, one per node, were installed as part of Titan’s unique hybrid architecture.
- Titan’s cabinet art is installed. Both the front and back row cabinets displayed collages including scientific visualizations produced using Jaguar and previous OLCF systems.
- High-Performance Computing (HPC) engineer Tom Papatheodore supports research teams attending the 2018 OLCF GPU Hackathon. OLCF staff members have hosted and co-organized dozens of GPU hackathons to equip the user community with GPU basics and optimizations that improve code performance on Titan and other accelerated HPC architectures.
- OLCF’s Kevin Bivens demonstrates the Tiny Titan “mini-supercomputer” to students visiting the ORNL Traveling Science Fair. To teach aspiring scientists the basics of parallel computing, OLCF staff members developed an educational classroom computer called Tiny Titan. Built from inexpensive Raspberry Pi processors and operated by a game console controller, Tiny Titan has traveled to schools, science festivals, and even Capitol Hill to demonstrate the power of supercomputing.
- On Titan, an ORNL team led by Jeremy Smith performed its largest biological simulation to explain why a structural component of plants called lignin is so potent in blocking the enzymes that break down cellulose, the primary ingredient of cellulosic ethanol, during biofuel processing. Further, the team identified a pathway to circumvent the problem, finding that amorphous, or less-ordered, cellulose fibers interact less with lignin and therefore are more accessible to enzymes.
- To contribute to the design of ice-resistant wind turbine blades, General Electric (GE) researchers simulated hundreds of water droplets, each including one million molecules, using Titan. Simulations accelerated at least 200 times over pre-GPU estimates, permitting GE to study the formation of individual ice molecules. Visualization by Mike Matheson (ORNL)
- In 2017, a team led by Thomas Jordan of the Southern California Earthquake Center used Titan to calculate the first 1 hertz CyberShake physics-based seismic hazard model for Central California. This CyberShake seismic hazard map shows the magnitude for the Los Angeles region, defined by the amount of change of a surface or structure in a 2-second period. The map provides engineers with vital information needed to design more seismically safe structures.
- A team of researchers led by C.S. Chang of Princeton Plasma Physics Laboratory used Titan to predict how ITER—the world’s largest experimental magnetic fusion reactor, currently under construction in France—will withstand the extreme heat involved in extracting exhaust. Image credit: ITER Organization
The Cray XK7 Titan supercomputer operated by the Oak Ridge Leadership Computing Facility (OLCF) at the US Department of Energy’s (DOE’s) Oak Ridge National Laboratory (ORNL) will be decommissioned on August 1 and disassembled for recycling. Performing up to 27 quadrillion calculations per second, Titan ranked as one of the world’s top 10 fastest supercomputers from its debut as No. 1 in 2012 until June 2019. Through more than 26 billion core hours of computing time, Titan has served hundreds of research teams around the world working on today’s most urgent scientific challenges.
A new generation of supercomputing
In 2009, the high-performance computing community was merely a year into surpassing the petascale barrier, achieving more than one quadrillion calculations per second on two DOE supercomputers: Roadrunner at Los Alamos National Laboratory and Jaguar at ORNL. CPU upgrades to the 2.3-petaflop Cray XT5 Jaguar, managed and operated by the OLCF, made it the world’s fastest petascale system on the November TOP500 list that year.
However, science never sleeps, and the OLCF was already planning its next supercomputer. This second-generation petascale system would need to be about 10 times more powerful than Jaguar to meet the growing computational needs of researchers working on complex problems in materials science, biology, physics, and other research domains. Even more challenging than increasing speed, the OLCF’s next supercomputer would need to meet DOE goals for cost and energy efficiency, meaning it would need to do 10 times the work while consuming roughly the same amount of energy.
Enter Titan, a new generation of supercomputer with a revolutionary architecture that combined AMD 16-core Opteron CPUs and NVIDIA Kepler accelerated processors known as GPUs, which tackled computationally intensive math problems while the CPUs efficiently directed tasks. When the system debuted at No. 1 in 2012, Titan delivered 10 times the performance of Jaguar, reaching a peak performance of 27 petaflops.
“Choosing a GPU-accelerated system was considered a risky choice,” said OLCF Program Director Buddy Bland. “A DOE independent project review committee insisted that we demonstrate that our users would be able to effectively use Titan for the broad range of modeling and simulation applications we support. We spent 6 months working with Cray, NVIDIA, and our users to convince the reviewers, DOE, and ourselves that GPUs would deliver what we needed. Yes, there was risk, but we developed effective ways to manage the risks and educate both our staff and users in how to use the system. The result has been a remarkably productive system that has led the way for many GPU-accelerated systems.”
Now in its seventh year of operation, Titan will be decommissioned to make room for a new scale of supercomputer: OLCF’s 2021 exascale system, Frontier.
Decommissioning details
OLCF is retrofitting 20,000 square feet of data center space that includes Titan, its Atlas file system, and the Cray XC30 Eos cluster, all of which will be decommissioned in August. Many OLCF system users are moving from Titan to the facility’s IBM AC922 Summit supercomputer, which was launched in 2018 and has its own data center space.
While researchers have continued to run large projects on Titan during the first half of 2019, June 30 will be the last day users can submit jobs to Titan or Eos (which is also 7 years old) in preparation for the August 1 decommissioning of both systems. Atlas will be decommissioned on August 15. The Rhea cluster will still be available to users but will transition from mounting the Atlas Lustre file system to mounting the Alpine GPFS file system that also supports Summit.
“Titan has run its course,” said Operations Manager Stephen McNally. “The components of Titan are now 7 years old, and it’s really impressive that users have been successfully producing high-impact science results since the system became available to them. But the reality is, in electronic years, Titan is ancient. Think of what a cell phone was like 7 years ago compared to the cell phones available today. Technology advances rapidly, including supercomputers.”
Decommissioning a computer Titan’s size requires collaboration between onsite staff, facility vendors, and users. OLCF staff are supporting users who need to complete runs, save data, or transition their projects to Summit and other resources.
“We’ve communicated shutdown deadlines to users so they can be prepared while still getting high-quality research done,” McNally said. “One big task for users has been cleaning up 32 petabytes of data and moving data from Atlas to other storage systems.”
Electricians will safely shut down the 9 megawatt-capacity system, and Cray staff will disassemble and recycle Titan’s electronics and its metal components and cabinets (which predated the system as Jaguar’s cabinets).
“People ask why we can’t split up Titan and donate sets of cabinets to different research groups, but the answer is that it’s simply not worth the cost to a data center or university of powering and cooling even fragments of Titan,” McNally said. “Titan’s value lies in the system as a whole.”
OLCF users are encouraged to review detailed system deadlines here.
UT-Battelle LLC manages Oak Ridge National Laboratory for DOE’s Office of Science, the single largest supporter of basic research in the physical sciences in the United States. DOE’s Office of Science is working to address some of the most pressing challenges of our time. For more information, visit https://science.energy.gov.