Titan Cray XK7

Titan will unlock secrets of the universe from the smallest to the largest scales, including (from left), the transition of states in a quantum magnet, the ability of promising new drugs to disassemble damaging fibrils in the brains of Alzheimer’s sufferers, the confinement and dispersion of small molecules within carbon nanostructures, the behavior of neutrons in a nuclear reactor core, long-term climate forecasting, and the mechanism through which a collapsing stellar core blows the star into space.


Titan Overview

The Oak Ridge Leadership Computing Facility (OLCF) is home to Titan, the nation’s most powerful supercomputer for open science.

Titan is a hybrid-architecture Cray XK7 system with a theoretical peak performance exceeding 27,000 trillion calculations per second (27 petaflops). It contains both advanced 16-core AMD Opteron central processing units (CPUs) and NVIDIA Kepler graphics processing units (GPUs). GPUs are energy-efficient, high-performance chips originally developed for gaming systems. The combination of these two technologies allows Titan to achieve 10 times the speed and 5 times the energy efficiency of its predecessor, the Jaguar supercomputer, while using only modestly more energy and occupying the same physical footprint.

Titan features 18,688 compute nodes, a total system memory of 710 terabytes, and Cray’s high-performance Gemini network. Its 299,008 CPU cores guide simulations while the accompanying GPUs that can handle hundreds of calculations simultaneously. The system provides decreased time to solution, increased complexity of models, and greater realism in simulations.

Titan is enabling researchers across the scientific arena to acquire unparalleled accuracy in their simulations and achieve research breakthroughs more rapidly than ever before. OLCF simulations have improved the safety and performance of nuclear power plants, turbomachinery, and aircraft; aided understanding of climate change; sped development of new drugs and advanced materials; and guided design of the ITER international fusion reactor. Researchers have used OLCF systems to model supernovas, hurricanes, biofuels, neurodegenerative diseases, and clean combustion for power and propulsion.

Titan users have access to data analysis and visualization resources that include the Eos and Rhea systems and the Exploratory Visualization Environment for REsearch in Science and Technology, or EVEREST. Users also have access to file systems—like Spider for immediate data storage, with over 1,000 gigabytes per second of aggregate data bandwidth and more than 30 petabytes of storage capacity, and the High Performance Storage System (HPSS) for archival data storage—to manage the floods of data that Titan’s simulations generate. All of these resources are available through high-performance networks including ESnet’s upgraded 100 gigabit per second links.

Computational scientists gain access to OLCF’s cutting-edge facilities and support systems through three programs that allocate millions of processor hours. The Innovative and Novel Computational Impact on Theory and Experiment program, or INCITE, supports large-scale, high-impact projects that make concurrent use of at least 20 percent of Titan’s cores. The Advanced Scientific Computing Research Leadership Computing Challenge, or ALCC program, primarily aids research that supports the energy mission of the Department of Energy’s Office of Science and emphasizes high-risk, high-rewards endeavors. And the OLCF’s Director’s Discretionary program helps new high-performance computing users explore topics of national importance.

Research challenges remain, but Titan is helping launch a new era for science and engineering as computing approaches the exascale, or a million trillion calculations a second.

More Information

For more information on the Titan project, please visit http://olcf.ornl.gov/titan.


For support on Titan, please visit http://www.olcf.ornl.gov/support.

Tech Specs

Titan System Configuration
Architecture: Cray XK7
Processor: 16-Core AMD
Cabinets: 200
Nodes: 18,688 AMD Opterons
Cores/node: 16
Total cores: 299,008 Opteron Cores
Memory/node: 32GB + 6GB
Memory/core: 2GB
Interconnect: Gemini
GPUs: 18,688 K20X Keplers
Speed: 27 PF
Square Footage More than 5,000 sq feet, including perimeter access requirements

Code List

Code DescriptionExample Science ProblemProgramming Model for AccelerationLibrariesPerformance InformationPoint of Contact

LAMMPS Molecular Science LAMMPS is a molecular dynamics general statistical mechanics based code applicable to bioenergy problems . http://lammps.sandia.gov/ Course-grained molecular dynamics simulation of bulk heterojunction polymer blend films used, e.g., within organic photovoltaic devices. OpenCL or CUDA Speedup is 1X to 7.4X on 900 nodes, comparing XK7 to XE6. The performance variation is strongly dependent upon the number of atoms per node. This algorithm is mixed precision on GPU, double precision on CPU. Mike Brown, ORNL
WL-LSMS Materials Science WL-LSMS. Wang-Landau (WL) – Linear Scaling Multiple Scattering (LSMS). A first principles density functional theory code (local density approximation) used to study magnetic materials Simulation of the magnetic phase transition in nickel. CUDA or CUDA and Libraries GPU: CULA, LibSciACC, cuBLAS CPU: BLAS, LAPACK XK7 vs XE6 speedup is 3.5X. Benchmark runs from 321 (321 WL walkers, 1024 atoms.) Markus Eisenbach, ORNL
S3D Combustion Science S3D. Direct numerical simulation of compressible, reacting flows for combustion science Temporal jet simulation of dimethyl-ether combustion OpenACC XK7 vs XE6 speedup is 2X. Ramanan Sankaran, ORNL
CAM-SE Climate change science CAM-SE. Community Atmosphere Model – Spectral Elements. http://earthsystemcog.org/projects/dcmip-2012/cam-se High-resolution atmospheric climate simulation using CAM5 physics and the MOZART chemistry package. CUDA Fortran Matt Norman, ORNL
DENOVO 3D neutron radiation transport for nuclear reactors DENOVO is a three-dimensional, massively parallel, deterministic radiation transport code. It is capable of solving both shielding and criticality problems on high-performance computing platforms. Reactor eigenvalue problem CUDA XK7 (CPU+GPU) vs XE6(CPU+CPU) speedup is 3.8X for the Denovo Sweep part only, on nearly 18K nodes. Tom Evans (ORNL), Wayne Joubert (ORNL)
QMCPACK Electronic Structure via Quantum Monte Carlo QMCPACK is a continnum quantum Monte Carlo (QMC) framework, used for the
many-body electronic structure calculations of molecules, solids and
nanostructures such as quantum dots.
Equations of state solids, defect formation energies, binding energies CUDA HDF5, libxml2, FFTW, Blas/Lapack, boost XK7(GPU) vs XE6 speedup of 2x for 500 electrons. GPUs in mainly single precision, CPU is mainly double precision Jeongnim Kim (ORNL)
AWP-ODC Anelastic Wave Propagation in Seismology, solid-earth science. AWP-ODC, a community seismic wave propagation code used by SCEC. Staggered grid finite difference including a coarse-grained anelasticity implementation. large-scale, high-resolution regional scale earthquake simulations, seismic hazard analysis using Strain Green Tensors CUDA XK7(GPU) vs XE6 speedup of 2.12X. Performance varies with the problem sizes Yifeng Cui, SDSC
NAMD Molecular Science NAMD is a molecular dynamics simulation package written using the Charm++ parallel programming model, noted for its parallel efficiency and often used to simulate large systems (millions of atoms). Molecular dynamics of 100M atom chromatophore benchmark CUDA 768 node simulation. XK7 vs XE6 is 1.8X and XK7 vs XK7 w/o GPU is 3.4X Jim Phillips, UIUC
CP2K Chemical Sciences CP2K is a freely available (GPL) program, written in Fortran 95, to perform atomistic and molecular simulations of solid state, liquid, molecular and biological systems. It provides a general framework for different methods: density functional theory (DFT) using a mixed Gaussian and plane waves approach (GPW), classical pair and many-body potentials, semi-empirical (AM1, PM3, MNDO, MNDOd, PM6) Hamiltonians, Quantum Mechanics/Molecular Mechanics (QM/MM) hybrid schemes relying on the Gaussian Expansion of the Electrostatic Potential (GEEP), and RI-MP2 many-body approach. RI-MP2 calculation of benzene. 200 XK7 nodes. CUDA 200 node simulation. XK7 vs XE6 is 2X. Joost VandeVondele (ETH)
DCA+ Condensed Matter Physics DCA+ – code to study using the Dynamic Cluster Approximation (DCA) models of high-temperature superconductors, such as the two-dimensional Hubbard model and extensions that allow studies of disorder effects and nanoscale inhomogeneities CUDA XK7 vs XE6 speedup is 4.4X. Thomas Schulthess (ETH)