Crest is an IBM Power8 testbed resource that has several purposes:
- Staff can develop and test codes for both IBM Power PC and Nvidia GPUs in anticipation of Summit.
- Staff can evaluate systems software including OS, batch systems, ofed, etc.
- Vendors can develop third-party products needed for Summit such as debuggers, compilers, etc.
It consists of one management node, one login node and four compute nodes:
- crest-mgmt1.ccs.ornl.gov
- crest-login1.ccs.ornl.gov
- crest[1-4].ccs.ornl.gov
Users should login to crest-login1.ccs.ornl.gov for access to Crest. Access is not available outside the ORNL NCCS network, so users will need to bounce through home.ccs.ornl.gov to access Crest.
Crest Hardware Specification
(4) Crest Compute Nodes
- IBM Power 8 CPUs (2 sockets, 10 cores, 8 threads)
- 256GB memory
- (4) NVIDIA Tesla K40m GPUs
- (2) Mellanox Connect-IB InfiniBand: FDR (56 Gb/s)
- (1) 4-port Gigabit Ethernet
(1) Crest Login Node
- IBM Power 8 CPUs (2 sockets, 10 cores, 8 threads)
- 128GB memory
- (2) Mellanox Connect-IB InfiniBand: (56 Gb/s)
- (1) 4-port Gigabit Ethernet
(1) Crest Management Node
- IBM Power 8 CPUs (2 sockets, 10 cores, 8 threads)
- 128GB memory
- (1) Mellanox Connect-IB InfiniBand: FDR (56 Gb/s)
- (1) 4-port Gigabit Ethernet
Crest Software
Current versions can be found with module avail
.
Compilers
- xlc/xlC: IBM XL C/C++ for Linux (little endian distributions). Documentation can be found here.
- xlf: IBM XL Fortran for Linux (little endian distributions). Documentation can be found here.
- clang: Information regarding IBM’s implementation of clang can be found here.
- gcc: gcc-4.9 is available.
IBM Parallel Environment (PE) Runtime Edition
- poe: The poe command is used to launch parallel applications. Documentation can be found here. Programming samples for MPI and PAMI can be found here.
libnuma: Warning: /sys not mounted or invalid. Assuming one node: No such file or directory
You can safely ignore this error for now. The bug has been identified, and we are working with IBM to resolve it
IBM Parallel Environment (PE) Developer Edition
Documentation can be found here.
IBM Engineering and Scientific Subroutine Library (ESSL)
Documentation can be found here.
IBM Parallel Engineering and Scientific Subroutine Library (PESSL)
Documentation can be found here.
Java Runtime Edition
Java 8.0 is available.
CUDA
CUDA 7.5 is the current version of CUDA. CUDA documentation can be found here.
IBM Platform LSF
The scheduler on Crest is IBM Platform LSF. More information about LSF can be found here.
LSF: Submitting a job
The LSF command used to submit jobs is bsub
. It operates a bit differently than qsub
in that a customary job script must be passed via STDIN. For instance, given the following job script in a file called job_script
:
#BSUB -q batch #BSUB -o alltoall.o%J #BSUB -e alltoall.e%J #BSUB -J alltoall #BSUB -n 40 #BSUB -W 180 #BSUB -network "type=sn_all:protocol=mpi:mode=us" poe ~/alltoall
The command needed to submit it would be:
bsub <job_script
bsub
without a redirect from STDIN simply expects a command. That command can be a script that contains other commands such as poe
, etc., but it needs to be in PATH, and any BSUB directives need to be passed as command line options.
A -network directive must be passed to a job to avoid error messages. A default -network directive has been configured and is defined as:
type=sn_single:protocol=mpi:mode=us:usage=dedicated:instance=2
More information about submitting jobs can be found here or in the bsub
man page.
LSF: Submitting an interactive job
To submit an interactive job you need to pass your login shell to it:
bsub -Is $SHELL
LSF: Submitting an MPS enabled job
On Crest nodes, the CUDA Multi-Process Service (MPS) is enabled via the GPUMPS
application profile. To submit an MPS enabled job, use the -app
option in your submission command. For example, an interactive job can be started with:
bsub -app GPUMPS -Is $SHELL
More information about CUDA MPS can be found in the NVIDIA Multi-Process Service overview document. When enabled, MPS will log its activity to the control and server logs located in /var/tmp/mps_log_[0-4]
.
LSF: Walltime Limits
To help ensure equal use opportunity, the following walltime batch limits are enforced:
Batch | 12 hours |
Interactive | 2 hours |
LSF useful commands
bjobs Display your jobs in the queue bkill Kill a job bsub Submit a job bhosts Display hosts and their resources lsload Display load information for hosts badmin Administrative tool for LSF hclose -C Close batch services for specified host with a comment (-C). hopen Open batch services for specified host.
To display a list of all the jobs in the queue:
bjobs -u all
To find more information about jobs in PENDING
status:
bjobs -p