What is Summit?
Summit is Oak Ridge National Laboratory’s (ORNL’s) latest leadership-class supercomputer located at the ORNL Oak Ridge Leadership Computing Facility, a US Department of Energy Office of Science User Facility.
Dedicated to open science, Summit will drive scientific study and discovery as one of the world’s most powerful computers by enabling five to 10 times the computational performance of its predecessor, Titan.
A Summit node consists of two IBM Power9 CPUs, six NVIDIA V100 GPUs, NVLink for high-speed CPU-CPU and CPU-GPU communication, over half a terabyte of memory, and a large burst buffer for efficient I/O. Tying together about 4,600 of these nodes is a non-blocking fat-tree network of Mellanox EDR Infiniband, which provides high-performance internode communication and file system access.
With the V100, Summit enables unprecedented parallelism for traditional high-performance computing while also delivering to the scientific community one of the largest systems for artificial intelligence and deep learning. It’s a tool to solve some of the world’s most challenging problems, regardless of scientific domain, and to prepare computational science for future exascale systems.
What kind of data storage will be available on Summit?
In addition to other OLCF storage systems, such as the High-Performance Storage System for archival purposes, Summit will include a 250PB IBM Spectrum Scale file system. This parallel file system, named Alpine, will have performance of roughly 2.5TB/s of sequential I/O and 2.2TB/s of random I/O.
What is a Burst Buffer?
The burst buffer is an intermediate, high-speed layer of storage that is positioned between the application and the parallel file system (PFS), absorbing the bulk data produced by the application at a rate four to five times faster than the PFS, while seamlessly draining the data to the PFS in the background. Consequently, the burst buffer will be able to expedite I/O, allowing the application to return to performing computation sooner. The burst buffer is built from non-volatile memory devices that have several desirable properties such as high I/O throughput, low access latency, and higher reliability.
What is NVLINK?
GPUs in Titan are attached to each node by a traditional PCIe interface, which limits how fast the CPU memory system can be accessed. For increased performance, Summit makes use of NVIDIA’s NVLink interconnect for CPU-GPU and GPU-GPU communication. NVLink provides two links between every processor, each with a 25GB/s peak bandwidth in each direction. Supporting a peak bi-directional bandwidth of 100GB/s, these links are vital to the performance of accelerated applications on Summit.
How does the Unified Memory Feature help?
The faster data movement that comes with NVLink, coupled with another feature known as Unified Memory, will simplify GPU accelerator programming. Unified Memory allows the programmer to treat the CPU and GPU memories as one block of memory. The programmer can operate on the data without worrying about whether it resides in the CPU’s or GPU’s memory.
What compilers will be available on Summit?
IBM XL, PGI, LLVM, GCC, NVIDIA CUDA Stack
What performance tools will be available on Summit?
MAP, Open|SpeedShop, TAU, HPCToolkit, VAMPIR/Score-P, Parallel Performance Toolkit, nvprof, gprof
What debugging tools will be available on Summit?
DDT, pdb, cuda-gdb, cuda-memcheck, valgrind, memcheck, helgrind, STAT
How much power will Summit consume? How does this compare to Titan?
Summit’s peak power consumption will be about 15 MW.
When will users get general access to Summit?
The plan of record is to provide early access to Summit for Early Science projects in late 2018 and to make Summit available to the OLCF User Programs starting in calendar year of 2019.
When will Titan be retired?
The plan of record is to keep Titan available for users for a period of time after Summit enters production.
What is Center for Accelerated Application Readiness (CAAR)?
The OLCF has created the Center for Accelerated Application Readiness, or CAAR, to help prepare codes for future generation systems. CAAR has established 13 partnership teams to prepare scientific applications for highly effective use on Summit. The partnership teams, consisting of the core developers of the application and staff from the OLCF, will receive support from the IBM/NVIDIA Center of Excellence at Oak Ridge National Laboratory and have access to multiple computational resources. For more information about CAAR, please visit olcf.ornl.gov/caar/.
What is CORAL?
CORAL is the collaboration between the two DOE Office of Science Leadership Computing Facility centers, Oak Ridge Leadership Computing Facility and Argonne Leadership Computing Facility, and the National Nuclear Security Association Laboratory, Lawrence Livermore National Laboratory (LLNL) to procure leadership computer systems for their respective sites to support national security and scientific discovery. CORAL is an acronym for Collaboration of Oak Ridge, Argonne, and Livermore. For more information about CORAL, please visit the CORAL fact sheet.
What does ‘Leadership Computing’ mean?
The DOE Office of Science provides a portfolio of national high-performance computing facilities housing some of the world’s most advanced supercomputers. These leadership computing facilities enable world-class research for significant advances in science.
The Oak Ridge Leadership Computing Facility (OLCF) was established at Oak Ridge National Laboratory in 2004 with the mission of accelerating scientific discovery and engineering progress by providing outstanding computing and data management resources to high-priority research and development projects.