titan

Up since 11/8/17 02:45 pm

eos

Up since 11/14/17 11:20 pm

rhea

Up since 10/17/17 05:40 pm

hpss

Up since 11/20/17 09:15 am

atlas1

Up since 11/15/17 07:25 am

atlas2

Up since 11/27/17 10:45 am
OLCF User Assistance Center

Can't find the information you need below? Need advice from a real person? We're here to help.

OLCF support consultants are available to respond to your emails and phone calls from 9:00 a.m. to 5:00 p.m. EST, Monday through Friday, exclusive of holidays. Emails received outside of regular support hours will be addressed the next business day.

Eos Scheduling Policy

See this article in context within the following user guides: Eos
Queue Policy

Queues are used by the batch scheduler to aid in the organization of jobs. Users typically have access to multiple queues, and each queue may allow different job limits and have different priorities. Unless otherwise notified, users have access to the following queues on Eos:

Name Usage Description
batch No explicit request required Default; most Eos work runs in this queue. 700 nodes available. See limits in the batch Queue section table below.
debug #PBS -q debug Quick-turnaround; short jobs for software generation, verification, and debugging. 36 nodes available. Users are limited to 1 job in any state for this queue.
The batch Queue

The batch queue is the default queue for work on Eos and has 700 nodes available. Most work on Eos is handled through this queue. The job time-limit is based based on job size as follows:

Size in Nodes Wall Clock Limit
1 to 175 nodes 24 hours
176 to 350 12 hours
351 to 700 4 hours

The batch queue enforces the following policies:

  • Unlimited running jobs
  • Limit of (2) eligible-to-run jobs per user.
  • Jobs in excess of the per user limit above will be placed into a held state, but will change to eligible-to-run at the appropriate time.
The debug Queue

The debug queue is intended to provide faster turnaround times for the code verification and debugging cycle. For example, interactive parallel work is an ideal use for the debug queue. 36 nodes are set aside for only debug use; although, a debug job can request more nodes and use nodes in the compute partition. The debug queue has a walltime of 2 hours and a limit of 1 job per user in any state.

Queue Priority

INCITE, ALCC, NOAA and Director’s Discretionary projects enter the queue system with equal priory by default on Eos.

The basic priority-setting mechanism for jobs waiting in the queue is the time a job has been waiting relative to other jobs in the queue. However, several factors are applied by the batch system to modify the apparent time a job has been waiting. These factors include:

      • The number of nodes requested by the job.
      • The queue to which the job is submitted.
      • The 8-week history of usage for the project associated with the job.
      • The 8-week history of usage for the user associated with the job.

If your jobs require resources outside these queue policies, please complete the relevant request form on the Special Requests page. If you have any questions or comments on the queue policies below, please direct them to the User Assistance Center.

Allocation Overuse Policy

Projects that overrun their allocation are still allowed to run on OLCF systems, although at a reduced priority. This is an adjustment to the apparent submit time of the job. However, this adjustment has the effect of making jobs appear much younger than jobs submitted under projects that have not exceeded their allocation. In addition to the priority change, these jobs are also limited in the amount of wall time that can be used.

For example, consider that job1 is submitted at the same time as job2. The project associated with job1 is over its allocation, while the project for job2 is not. The batch system will consider job2 to have been waiting for a longer time than job1.

The adjustment to the apparent submit time depends upon the percentage that the project is over its allocation, as shown in the table below:

% Of Allocation Used Priority Reduction
< 100% 0 days
100% to 125% 30 days
> 125% 365 days
Impact of Overuse on Separately Allocated Resources

Running in excess of the allocated time on one resource will not impact the priority on separately allocated resources. Eos allocations are given separately from Titan allocations; Overuse of a project’s allocation on Titan will not impact that project’s priority on Eos if there is time remaining in the project’s Eos allocation.

FairShare Scheduling Policy

FairShare, as its name suggests, tries to push each user and project towards their fair share of the system’s utilization: in this case, 5% of the system’s utilization per user and 10% of the system’s utilization per project.

To do this, the job scheduler adds (30) minutes priority aging per user and (1) hour of priority aging per project for every (1) percent the user or project is under its fair share value for the prior (8) weeks. Similarly, the job scheduler subtracts priority in the same way for users or projects that are over their fair share.

For instance, a user who has personally used 0.0% of the system’s utilization over the past (8) weeks who is on a project that has also used 0.0% of the system’s utilization will get a (12.5) hour bonus (5 * 30 min for the user + 10 * 1 hour for the project).

In contrast, a user who has personally used 0.0% of the system’s utilization on a project that has used 12.5% of the system’s utilization would get no bonus (5 * 30 min for the user – 2.5 * 1 hour for the project).