Compiling and Node Types
Print this article
Titan is comprised of different types of nodes:
- Login nodes running traditional Linux
- Service nodes running traditional Linux
- Compute nodes running the Cray Node Linux (CNL) microkernel
The type of work you are performing will dictate the type of node for which you build your code.
Compiling for Compute Nodes (Cross Compilation)
Titan compute nodes are the nodes that carry out the vast majority of computation on the system. Compute nodes are running the CNL microkernel, which is markedly different than the OS running on the login and service nodes. Most code that runs on Titan will be built targeting the compute nodes.
All parallel codes should run on the compute nodes. Compute nodes are accessible only by invoking
aprun within a batch job. To build codes for the compute nodes, you should use the Cray compiler wrappers:
titan-ext$ cc code.c titan-ext$ CC code.cc titan-ext$ ftn code.f90
ftncompiler wrappers be used when compiling and linking source code for use on Titan compute nodes.
On Titan, and Cray machines in general, statically linked executables will perform better and are easier to launch. Depending on the module files you load, certain Cray-provided modules and libraries (such as
cudart) may employ and configure dynamic linking automatically; the following warnings do not apply to them.
cudart. Any necessary dynamic linking of these libraries will be configured automatically by the Cray compiler wrappers.
If you must use shared object libraries, you will need to copy all necessary libraries to a Lustre scratch area (
$PROJWORK) or the NFS
/ccs/proj area and then update your
LD_LIBRARY_PATH environment variable to include this directory. Due to the meta-data overhead, /ccs/proj is the suggested area for shared objects and python modules. For example, the following command will append your project’s NFS home area to the
LD_LIBRARY_PATH in bash:
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/ccs/proj/[projid]
Compiling with shared libraries can be further complicated by the fact that Titan’s login nodes do not run the same operating system as the compute nodes, and thus many shared libraries are available on the login nodes which are not available on the compute nodes. This means that an executable may appear to compile correctly on a login node, but will fail to start on a compute node because it will be unable to locate the shared libraries it needs.
It may appear that this could be resolved by locating the shared libraries on the login node and copying them to Lustre or /ccs/proj for use on the compute nodes. This is inadvisable because these shared libraries were not compiled for the compute nodes, and may perform erratically. Also, referring directly to these libraries circumvents the
module system, and may jeopardize your deployment environment in the event of system upgrades.
For performance considerations, it is important to bear in mind that each node in your job will need to search through
$LD_LIBRARY_PATH for each missing dynamic library, which could cause a bottleneck with the Lustre Metadata Server. Lastly, calls to functions in dynamic libraries will not benefit from compiler optimizations that are available when using static libraries.
Compiling for Login or Service Nodes
When you log into Titan you are placed on a login node. When you submit a job for execution, your job script is initially launched on one of a small number of shared service nodes. All tasks not launched through
aprun will run on the service node. Users should note that there are a small number of these login and service nodes, and they are shared by all users. Because of this, long-running or memory-intensive work should not be performed on login nodes nor service nodes.
ftn your code will be built for the compute nodes by default. If you wish to build code for the Titan login nodes or service nodes, you must do one of the following:
- Add the
-target-cpu=mc8flag to your
- Use the
module swap craype-interlagos craype-mc8
- Call the underlying compilers directly (e.g.
- Use the
craype-network-nonemodule to remove the network and MPI libraries:
module swap craype-network-gemini craype-network-none
XK7 Service/Compute Node Incompatibilities
On the Cray XK7 architecture, service nodes differ greatly from the compute nodes. The difference between XK7 compute and service nodes may cause cross compiling issues that did not exist on Cray XT5 systems and prior.
For XK7, login and service nodes use AMD’s Istanbul-based processor, while compute nodes use the newer Interlagos-based processor. Interlagos-based processors include instructions not found on Istanbul-based processors, so executables compiled for the compute nodes will not run on the login nodes nor service nodes; typically crashing with an illegal instruction error. Additionally, codes compiled specifically for the login or service nodes will not run optimally on the compute nodes.
Optimization Target Warning
Because of the difference between the login/service nodes (on which code is built) and the compute nodes (on which code is run), a software package’s build process may inject optimization flags incorrectly targeting the login/service nodes. Users are strongly urged to check makefiles for CPU-specific optimization flags (ex: -tp, -hcpu, -march). Users should not need to set such flags; the Cray compiler wrappers will automatically add CPU-specific flags to the build. Choosing the incorrect processor optimization target can negatively impact code performance.