Skip to main content

Crest is an IBM Power8 testbed resource that has several purposes:

  • Staff can develop and test codes for both IBM Power PC and Nvidia GPUs in anticipation of Summit.
  • Staff can evaluate systems software including OS, batch systems, ofed, etc.
  • Vendors can develop third-party products needed for Summit such as debuggers, compilers, etc.

It consists of one management node, one login node and four compute nodes:

  • crest-mgmt1.ccs.ornl.gov
  • crest-login1.ccs.ornl.gov
  • crest[1-4].ccs.ornl.gov

Users should login to crest-login1.ccs.ornl.gov for access to Crest. Access is not available outside the ORNL NCCS network, so users will need to bounce through home.ccs.ornl.gov to access Crest.

Crest Hardware Specification

(4) Crest Compute Nodes

  • IBM Power 8 CPUs (2 sockets, 10 cores, 8 threads)
  • 256GB memory
  • (4) NVIDIA Tesla K40m GPUs
  • (2) Mellanox Connect-IB InfiniBand: FDR (56 Gb/s)
  • (1) 4-port Gigabit Ethernet

(1) Crest Login Node

  • IBM Power 8 CPUs (2 sockets, 10 cores, 8 threads)
  • 128GB memory
  • (2) Mellanox Connect-IB InfiniBand: (56 Gb/s)
  • (1) 4-port Gigabit Ethernet

(1) Crest Management Node

  • IBM Power 8 CPUs (2 sockets, 10 cores, 8 threads)
  • 128GB memory
  • (1) Mellanox Connect-IB InfiniBand: FDR (56 Gb/s)
  • (1) 4-port Gigabit Ethernet

Crest Software

Current versions can be found with module avail.

Compilers

  • xlc/xlC: IBM XL C/C++ for Linux (little endian distributions). Documentation can be found here.
  • xlf: IBM XL Fortran for Linux (little endian distributions). Documentation can be found here.
  • clang: Information regarding IBM’s implementation of clang can be found here.
  • gcc: gcc-4.9 is available.

IBM Parallel Environment (PE) Runtime Edition

  • poe: The poe command is used to launch parallel applications. Documentation can be found here. Programming samples for MPI and PAMI can be found here.
  • Note: You will see these errors when running parallel jobs:
    libnuma: Warning: /sys not mounted or invalid. Assuming one node: No such file or directory

    You can safely ignore this error for now. The bug has been identified, and we are working with IBM to resolve it

IBM Parallel Environment (PE) Developer Edition

Documentation can be found here.

IBM Engineering and Scientific Subroutine Library (ESSL)

Documentation can be found here.

IBM Parallel Engineering and Scientific Subroutine Library (PESSL)

Documentation can be found here.

Java Runtime Edition

Java 8.0 is available.

CUDA

CUDA 7.5 is the current version of CUDA. CUDA documentation can be found here.

IBM Platform LSF

The scheduler on Crest is IBM Platform LSF. More information about LSF can be found here.

LSF: Submitting a job

The LSF command used to submit jobs is bsub. It operates a bit differently than qsub in that a customary job script must be passed via STDIN. For instance, given the following job script in a file called job_script:

#BSUB -q batch 
#BSUB -o alltoall.o%J
#BSUB -e alltoall.e%J
#BSUB -J alltoall
#BSUB -n 40
#BSUB -W 180
#BSUB -network "type=sn_all:protocol=mpi:mode=us"

poe ~/alltoall

The command needed to submit it would be:

bsub <job_script

bsub without a redirect from STDIN simply expects a command. That command can be a script that contains other commands such as poe, etc., but it needs to be in PATH, and any BSUB directives need to be passed as command line options.

A -network directive must be passed to a job to avoid error messages. A default -network directive has been configured and is defined as:

type=sn_single:protocol=mpi:mode=us:usage=dedicated:instance=2

More information about submitting jobs can be found here or in the bsub man page.

LSF: Submitting an interactive job

To submit an interactive job you need to pass your login shell to it:

bsub -Is $SHELL

LSF: Submitting an MPS enabled job

On Crest nodes, the CUDA Multi-Process Service (MPS) is enabled via the GPUMPS application profile. To submit an MPS enabled job, use the -app option in your submission command. For example, an interactive job can be started with:

bsub -app GPUMPS -Is $SHELL

More information about CUDA MPS can be found in the NVIDIA Multi-Process Service overview document. When enabled, MPS will log its activity to the control and server logs located in /var/tmp/mps_log_[0-4].

LSF: Walltime Limits

To help ensure equal use opportunity, the following walltime batch limits are enforced:

Batch 12 hours
Interactive 2 hours

LSF useful commands

bjobs       Display your jobs in the queue
bkill       Kill a job
bsub        Submit a job
bhosts      Display hosts and their resources
lsload      Display load information for hosts
badmin      Administrative tool for LSF
     hclose -C   Close batch services for specified host with a comment (-C).
     hopen       Open batch services for specified host.

To display a list of all the jobs in the queue:

bjobs -u all

To find more information about jobs in PENDING status:

bjobs -p