titan

Up since 11/8/17 02:45 pm

eos

Up since 11/14/17 11:20 pm

rhea

Up since 10/17/17 05:40 pm

hpss

Up since 11/20/17 09:15 am

atlas1

Up since 11/15/17 07:25 am

atlas2

Up since 11/27/17 10:45 am
OLCF User Assistance Center

Can't find the information you need below? Need advice from a real person? We're here to help.

OLCF support consultants are available to respond to your emails and phone calls from 9:00 a.m. to 5:00 p.m. EST, Monday through Friday, exclusive of holidays. Emails received outside of regular support hours will be addressed the next business day.

Job Execution on Eos

See this article in context within the following user guides: Eos

Running jobs on Eos is similar Titan, except for some important differences:

  • The compute nodes have 16 physical cores and no GPUs are present.
  • Intel’s Hyper-threading (HT) technology, allows each physical core to appear as two logical cores so each node can functions as if it has 32 cores.
  • The default option on Eos is to run with Hyper Threading. You need to use the -j1 option with the aprun command to explicitly disable HT.
  • Each code should be tested to see how HT impacts its performance before HT is used.

Once resources have been allocated through the batch system, users can:

  • Run commands in serial on the resource pool’s primary service node
  • Run executables in parallel across compute nodes in the resource pool
Serial Execution

The executable portion of a batch script is interpreted by the shell specified on the first line of the script. If a shell is not specified, the submitting user’s default shell will be used. This portion of the script may contain comments, shell commands, executable scripts, and compiled executables. These can be used in combination to, for example, navigate file systems, set up job execution, run executables, and even submit other batch jobs.

Parallel Execution

By default, commands in the job submission script will be executed on the job’s primary service node. The aprun command is used to execute a binary on one or more compute nodes within a job’s allocated resource pool.

Note: On Eos, the only way access a compute node is via the aprun command within a batch job.