titan

Up since 11/8/17 02:45 pm

eos

Up since 11/14/17 11:20 pm

rhea

Up since 10/17/17 05:40 pm

hpss

Up since 11/20/17 09:15 am

atlas1

Up since 11/15/17 07:25 am

atlas2

Up since 11/27/17 10:45 am
OLCF User Assistance Center

Can't find the information you need below? Need advice from a real person? We're here to help.

OLCF support consultants are available to respond to your emails and phone calls from 9:00 a.m. to 5:00 p.m. EST, Monday through Friday, exclusive of holidays. Emails received outside of regular support hours will be addressed the next business day.

Job Execution on Titan

Once resources have been allocated through the batch system, users can:

  • Run commands in serial on the resource pool’s primary service node
  • Run executables in parallel across compute nodes in the resource pool
Serial Execution

The executable portion of a batch script is interpreted by the shell specified on the first line of the script. If a shell is not specified, the submitting user’s default shell will be used. This portion of the script may contain comments, shell commands, executable scripts, and compiled executables. These can be used in combination to, for example, navigate file systems, set up job execution, run executables, and even submit other batch jobs.

Warning: On Titan, each batch job is limited to 200 simultaneous processes. Attempting to open more simultaneous processes than the limit will result in No space left on device errors.
Parallel Execution

By default, commands in the job submission script will be executed on the job’s primary service node. The aprun command is used to execute a binary on one or more compute nodes within a job’s allocated resource pool.

Note: On Titan, the only way access a compute node is via the aprun command within a batch job.