Login vs. Service vs. Compute Nodes
Categories: Running Jobs
Print this article
Cray Supercomputers are complex collections of different types of physical nodes/machines. For simplicity, we can think of Titan nodes as existing in one of three categories: login nodes, service nodes, or compute nodes.
Login nodes are designed to facilitate ssh access into the overall system, and to handle simple tasks. When you first log in, you are placed on a login node. Login nodes are shared by all users of a system, and should only be used for basic tasks such as file editing, code compilation, data backup, and job submission. Login nodes should not be used for memory-intensive nor processing-intensive tasks. Users should also limit the number of simultaneous tasks performed on login nodes. For example, a user should not run ten simultaneous tar processes.
Memory-intensive tasks, processor-intensive tasks, and any production-type work should be submitted to the machine’s batch system (e.g. to Torque/MOAB via
qsub). When a job is submitted to the batch system, the job submission script is first executed on a service node.
Any job submitted to the batch system is handled in this way, including interactive batch jobs (e.g. via
qsub -I). Often users are under the (false) impression that they are executing commands on compute nodes while typing commands in an interactive batch job. On Cray machines, this is not the case.
On Cray machines, when the
aprun command is issued within a job script (or on the command line within an interactive batch job), the binary passed to
aprun is copied to and executed in parallel on a set of compute nodes. Compute nodes run a Linux microkernel for reduced overhead and improved performance.