Eos Scheduling Policy
Categories: Running Jobs
Print this article
Queues are used by the batch scheduler to aid in the organization of jobs. Users typically have access to multiple queues, and each queue may allow different job limits and have different priorities. Unless otherwise notified, users have access to the following queues on Eos:
||No explicit request required||Default; most Eos work runs in this queue. 700 nodes available. See limits in the
||Quick-turnaround; short jobs for software generation, verification, and debugging. 36 nodes available. Users are limited to 1 job in any state for this queue.|
batch queue is the default queue for work on Eos and has 700 nodes available. Most work on Eos is handled through this queue. The job time-limit is based based on job size as follows:
|Size in Nodes||Wall Clock Limit|
|1 to 175 nodes||24 hours|
|176 to 350||12 hours|
|351 to 700||4 hours|
The batch queue enforces the following policies:
- Unlimited running jobs
- Limit of (2) eligible-to-run jobs per user.
- Jobs in excess of the per user limit above will be placed into a held state, but will change to eligible-to-run at the appropriate time.
debug queue is intended to provide faster turnaround times for the code verification and debugging cycle. For example, interactive parallel work is an ideal use for the
debug queue. 36 nodes are set aside for only debug use; although, a debug job can request more nodes and use nodes in the compute partition. The debug queue has a walltime of 2 hours and a limit of 1 job per user in any state.
INCITE, ALCC, NOAA and Director’s Discretionary projects enter the queue system with equal priory by default on Eos.
The basic priority-setting mechanism for jobs waiting in the queue is the time a job has been waiting relative to other jobs in the queue. However, several factors are applied by the batch system to modify the apparent time a job has been waiting. These factors include:
- The number of nodes requested by the job.
- The queue to which the job is submitted.
- The 8-week history of usage for the project associated with the job.
- The 8-week history of usage for the user associated with the job.
If your jobs require resources outside these queue policies, please complete the relevant request form on the Special Requests page. If you have any questions or comments on the queue policies below, please direct them to the User Assistance Center.
Allocation Overuse Policy
Projects that overrun their allocation are still allowed to run on OLCF systems, although at a reduced priority. This is an adjustment to the apparent submit time of the job. However, this adjustment has the effect of making jobs appear much younger than jobs submitted under projects that have not exceeded their allocation. In addition to the priority change, these jobs are also limited in the amount of wall time that can be used.
For example, consider that
job1 is submitted at the same time as
job2. The project associated with
job1 is over its allocation, while the project for
job2 is not. The batch system will consider
job2 to have been waiting for a longer time than
The adjustment to the apparent submit time depends upon the percentage that the project is over its allocation, as shown in the table below:
|% Of Allocation Used||Priority Reduction|
|< 100%||0 days|
|100% to 125%||30 days|
|> 125%||365 days|
Impact of Overuse on Separately Allocated Resources
Running in excess of the allocated time on one resource will not impact the priority on separately allocated resources. Eos allocations are given separately from Titan allocations; Overuse of a project’s allocation on Titan will not impact that project’s priority on Eos if there is time remaining in the project’s Eos allocation.
FairShare, as its name suggests, tries to push each user and project towards their fair share of the system’s utilization: in this case, 5% of the system’s utilization per user and 10% of the system’s utilization per project.
To do this, the job scheduler adds (30) minutes priority aging per user and (1) hour of priority aging per project for every (1) percent the user or project is under its fair share value for the prior (8) weeks. Similarly, the job scheduler subtracts priority in the same way for users or projects that are over their fair share.
For instance, a user who has personally used 0.0% of the system’s utilization over the past (8) weeks who is on a project that has also used 0.0% of the system’s utilization will get a (12.5) hour bonus (5 * 30 min for the user + 10 * 1 hour for the project).
In contrast, a user who has personally used 0.0% of the system’s utilization on a project that has used 12.5% of the system’s utilization would get no bonus (5 * 30 min for the user – 2.5 * 1 hour for the project).