Publishing Notice Prior to publishing work performed on Summit, please contact email@example.com.
Summit is an IBM system located at the Oak Ridge Leadership Computing Facility. With a theoretical peak double-precision performance of approximately 200 PF, it is one of the most capable systems in the world for a wide range of traditional computational science applications. It is also one of the “smartest” computers in the world for deep learning applications with a mixed-precision capability in excess of 3 EF.
The basic building block of Summit is the IBM Power System AC922 node. Each of the approximately 4,600 compute nodes on Summit contains two IBM POWER9 processors and 6 NVIDIA Volta V100 accelerators and provides a theoretical double-precision capability of approximately 40 TF. Each POWER9 processor is connected via dual NVLINK bricks, each capable of a 25GB/s transfer rate in each direction. Nodes contain 512 GB of DDR4 memory for use by the POWER9 processors and 96 GB of High Bandwidth Memory (HBM2) for use by the accelerators. Additionally, each node has 1.6TB of non-volatile memory that can be used as a burst buffer.
The POWER9 processor is built around IBM’s SIMD Multi-Core (SMC). The processor provides 22 SMCs with separate 32kB L1 data and instruction caches. Pairs of SMCs share a 512kB L2 cache and a 10MB L3 cache. SMCs support Simultaneous Multi-Threading (SMT) up to a level of 4, meaning each physical core supports up to 4 hardware threads.
The POWER9 processors and V100 accelerators are cooled with cold plate technology. The remaining components are cooled through more traditional methods, although exhaust is passed through a back-of-cabinet heat exchanger prior to being released back into the room. Both the cold plate and heat exchanger operate using medium temperature water which is more cost-effective for the center to maintain than chilled water used by older systems.
On Summit, there are three major types of nodes you will encounter: Login, Launch, and Compute. While all of these are identical in terms of hardware, they differ considerably in their intended use.
|Login||When you connect to Summit, you’re placed on a login node. This is the place to write/edit/compile your code, manage data, submit jobs, etc. You should never launch parallel jobs from a login node nor should you run threaded jobs on a login node. Login nodes are shared resources that are in use by many users simultaneously.|
|Launch||When your batch script (or interactive batch job) starts running, it will execute on a Launch Node. (If you are/were a user of Titan, these are similar in function to service nodes on that system). All commands within your job script (or the commands you run in an interactive job) will run on a launch node. Like login nodes, these are shared resources so you should not run multiprocessor/threaded programs on Launch nodes. It is appropriate to launch the jsrun command from launch nodes.|
|Compute||Most of the nodes on Summit are compute nodes. These are where your parallel job executes. They’re accessed via the jsrun command.|
Although the nodes are logically organized into different types, they all contain the same hardware. As a result of this homogeneous architecture there is not a need to cross-compile when building on a login node. Since login nodes have the same hardware resources as compute nodes, any tests that are run by your build process (especially by utilities such as
cmake) will have access to the same type of hardware that is on compute nodes and should not require intervention that might be required on non-homogeneous systems.
Summit nodes are connected to a dual-rail EDR InfiniBand network providing a node injection bandwidth of 23 GB/s. Nodes are interconnected in a Non-blocking Fat Tree topology. This interconnect is a three-level tree implemented by a switch to connect nodes within each cabinet (first level) along with Director switches (second and third level) that connect cabinets together.
Summit is connected to an IBM Spectrum Scale™ filesystem providing 250PB of storage capacity with a peak write speed of 2.5 TB/s. Summit also has access to the center-wide NFS-based filesystem (which provides user and project home areas) and has access to the center’s High Performance Storage System (HPSS) for user and project archival storage.
Summit is running Red Hat Enterprise Linux (RHEL) version 7.4.
The IBM POWER9 processor supports Hardware Threads. Each of the POWER9’s physical cores has 4 “slices”. These slices provide Simultaneous Multi Threading (SMT) support within the core. Three SMT modes are supported: SMT4, SMT2, and SMT1. In SMT4 mode, each of the slices operates independently of the other three. This would permit four separate streams of execution (i.e. OpenMP threads or MPI tasks) on each physical core. In SMT2 mode, pairs of slices work together to run tasks. Finally, in SMT1 mode the four slices work together to execute the task/thread assigned to the physical core. Regardless of the SMT mode used, the four slices share the physical core’s L1 instruction & data caches.