titan

Up since 11/8/17 02:45 pm

eos

Up since 11/14/17 11:20 pm

rhea

Up since 10/17/17 05:40 pm

hpss

Up since 11/20/17 09:15 am

atlas1

Up since 11/15/17 07:25 am

atlas2

Up since 11/27/17 10:45 am
OLCF User Assistance Center

Can't find the information you need below? Need advice from a real person? We're here to help.

OLCF support consultants are available to respond to your emails and phone calls from 9:00 a.m. to 5:00 p.m. EST, Monday through Friday, exclusive of holidays. Emails received outside of regular support hours will be addressed the next business day.

Interactive Batch Jobs on Commodity Clusters

See this article in context within the following user guides: Lens

Batch scripts are useful when one has a pre-determined group of commands to execute, the results of which can be viewed at a later time. However, it is often necessary to run tasks on compute resources interactively.

Users are not allowed to access cluster compute nodes directly from a login node. Instead, users must use an interactive batch job to allocate and gain access to compute resources. This is done by using the -I option to qsub. Other PBS options are passed to qsub on the command line as well:

  $ qsub -I -A abc123 -q qname -V -l nodes=4 -l walltime=30:00:00

This request will:

-I Start an interactive session
-A Charge to the abc123 project
-q qname Run in the qname queue
-V Export the user’s shell environment to the job’s environment
-l nodes=4 Request (4) nodes…
-l walltime=30:00:00 …for (30) minutes

After running this command, the job will wait until enough compute nodes are available, just as any other batch job must. However, once the job starts, the user will be given an interactive prompt on the primary compute node within the allocated resource pool. Commands may then be executed directly (instead of through a batch script).

Using to Debug

A common use of interactive batch is to aid in debugging efforts. Interactive access to compute resources allows the ability to run a process to the point of failure; however, unlike a batch job, the process can be restarted after brief changes are made without losing the compute resource pool; thus speeding up the debugging effort.

Choosing a Job Size

Because interactive jobs must sit in the queue until enough resources become available to allocate, it is useful to base core selection on the number of currently unallocated cores (to shorten the queue wait time).

Use the showbf command (i.e. “show backfill”) to see resource limits that would allow your job to be immediately backfilled (and thus started) by the scheduler. For example, the snapshot below shows that (8) nodes are currently free.

  $ showbf

  Partition   Tasks  Nodes  StartOffset   Duration   StartDate
  ---------   -----  -----  ------------  ---------  --------------
  lens        4744   8      INFINITY      00:00:00   HH:MM:SS_MM/DD

See the output of the showbf –help command for additional options.