titan

Up since 2/7/17 06:25 pm

eos

Up since 2/7/17 06:25 pm

rhea

Up since 2/16/17 02:30 pm

hpss

Up since 2/16/17 03:55 pm

atlas1

Up since 2/12/17 03:50 am

atlas2

Up since 2/7/17 09:45 am
OLCF User Assistance Center

The center's normal support hours are 9 a.m. until 5 p.m. (Eastern time) Monday through Friday, exclusive of holidays. Outside of normal business hours, calls are directed to the ORNL Computer Operations staff. If you require immediate assistance outside of normal business hours, you may contact them at the phone number listed above. If your request is not urgent, you may send an email to help@nccs.gov, where it will be answered by a NCCS User Assistance member the next business day.

OLCF Policy Guide

Contents

This guide presents all official policies of the OLCF.


1. Computing Policy

(Back to Top)

Note: This details an official policy of the OLCF, and must be agreed to by the following persons as a condition of access to and use of OLCF computational resources:
  • Principal Investigators (Non-Profit)
  • Principal Investigators (Industry)
  • All Users
Title: Computing Policy Version: 12.10
Computer Use
Computers, software, and communications systems provided by the OLCF are to be used for work associated with and within the scope of the approved project. The use of OLCF resources for personal or non-work-related activities is prohibited. All computers, networks, E-mail, and storage systems are property of the United States Government. Any misuse or unauthorized access is prohibited, and is subject to criminal and civil penalties. OLCF systems are provided to our users without any warranty. OLCF will not be held liable in the event of any system failure or data loss or corruption for any reason including, but not limited to: negligence, malicious action, accidental loss, software errors, hardware failures, network losses, or inadequate configuration of any computing resource or ancillary system.
Data Use
Prohibited Data
The OLCF computer systems are operated as research systems and only contain data related to scientific research and do not contain personally identifiable information (data that falls under the Privacy Act of 1974 5U.S.C. 552a). Use of OLCF resources to store, manipulate, or remotely access any national security information is strictly prohibited. This includes, but is not limited to: classified information, unclassified controlled nuclear information (UCNI), naval nuclear propulsion information (NNPI), the design or development of nuclear, biological, or chemical weapons or any weapons of mass destruction. Authors/generators/owners of information are responsible for its correct categorization as sensitive or non-sensitive. Owners of sensitive information are responsible for its secure handling, transmission, processing, storage, and disposal on OLCF systems. Principal investigators, users, or project delegates that use OLCF resources, or are responsible for overseeing projects that use OLCF resources, are strictly responsible for knowing whether their project generates any of these prohibited data types or information that falls under Export Control. For questions, contact help@nccs.gov.
Confidentiality, Integrity, and Availability
The OLCF systems provide protections to maintain the confidentiality, integrity, and availability of user data. Measures include the availability of file permissions, archival systems with access control lists, and parity and CRC checks on data paths and files. It is the user’s responsibility to set access controls appropriately for the data. In the event of system failure or malicious actions, the OLCF makes no guarantee against loss of data or that a user’s data can be accessed, changed, or deleted by another individual. It is the user’s responsibility to insure the appropriate level of backup and integrity checks on critical data and programs.
Data Modification/Destruction
Users are prohibited from taking unauthorized actions to intentionally modify or delete information or programs.
Data Retention
The OLCF reserves the right to remove any data at any time and/or transfer data to other users working on the same or similar project once a user account is deleted or a person no longer has a business association with the OLCF. After a sensitive project has ended or has been terminated, all data related to the project must be purged from all OLCF computing resources within 30 days.
Software Use
All software used on OLCF computers must be appropriately acquired and used according to the appropriate software license agreement. Possession, use, or transmission of illegally obtained software is prohibited. Likewise, users shall not copy, store, or transfer copyrighted software, except as permitted by the owner of the copyright. Only export-controlled codes approved by the Export Control Office may be run by parties with sensitive data agreements.
Malicious Software
Users must not intentionally introduce or use malicious software such as computer viruses, Trojan horses, or worms.
Reconstruction of Information or Software
Users are not allowed to reconstruct information or software for which they are not authorized. This includes but is not limited to any reverse engineering of copyrighted software or firmware present on OLCF computing resources.
User Accountability
Users are accountable for their actions and may be held accountable to applicable administrative or legal sanctions.
Monitoring and Privacy
Users are advised that there is no expectation of privacy of your activities on any system that is owned by, leased or operated by UT-Battelle on behalf of the U.S. Department of Energy (DOE). The Company retains the right to monitor all activities on these systems, to access any computer files or electronic mail messages, and to disclose all or part of information gained to authorized individuals or investigative agencies, all without prior notice to, or consent from, any user, sender, or addressee. This access to information or a system by an authorized individual or investigative agency is in effect during the period of your access to information on a DOE computer and for a period of three years thereafter. OLCF personnel and users are required to address, safeguard against, and report misuse, abuse and criminal activities. Misuse of OLCF resources can lead to temporary or permanent disabling of accounts, loss of DOE allocations, and administrative or legal actions. Users who have not accessed a OLCF computing resource in at least 6 months will be disabled. They will need to reapply to regain access to their account. All users must reapply annually.
Authentication and Authorization
All users are required to use a one-time password for authentication. Tokens will be distributed to OLCF users. Users will be required to create a Personal Identification Number (PIN). This is used in conjunction with a generated token code as part of a two-factor authentication implementation. Accounts on the OLCF machines are for the exclusive use of the individual user named in the account application. Users should not share accounts or tokens with anyone. If evidence is found that more than one person is using an account, that account will be disabled immediately. Users are not to attempt to receive unintended messages or access information by some unauthorized means, such as imitating another system, impersonating another user or other person, misuse of legal user credentials (usernames, tokens, etc.), or by causing some system component to function incorrectly. Users are prohibited from changing or circumventing access controls to allow themselves or others to perform actions outside their authorized privileges. Users must notify the OLCF immediately when they become aware that any of the accounts used to access OLCF have been compromised. Users should inform the OLCF promptly of any changes in their contact information (E-mail, phone, affiliation, etc.) Updates should be sent to accounts@ccs.ornl.gov.
Foreign National Access
Applicants who appear on a restricted foreign country listing in section 15 CFR 740.7 License Exceptions for Computers are denied access based on US Foreign Policy. The countries cited are Cuba, Iran, North Korea, Sudan, and Syria. Additionally, no work may be performed on OLCF computers on behalf of foreign nationals from these countries.
Denial of Service
Users may not deliberately interfere with other users accessing system resources.


2. Data Management Policy

(Back to Top)

Note: This details an official policy of the OLCF, and must be agreed to by the following persons as a condition of access to or use of OLCF computational resources:
  • Principal Investigators (Non-Profit)
  • Principal Investigators (Industry)
  • All Users
Title: Data Management Policy Version: 14.01
Introduction
The OLCF provides a comprehensive suite of hardware and software resources for the creation, manipulation, and retention of scientific data. This document comprises guidelines for acceptable use of those resources. It is an official policy of the OLCF, and as such, must be agreed to by relevant parties as a condition of access to and use of OLCF computational resources.
Data Storage Resources
The OLCF provides an array of data storage platforms, each designed with a particular purpose in mind. Storage areas are broadly divided into two categories: those intended for user data and those intended for project data. Within each of the two categories, we provide different sub-areas, each with an intended purpose:
Purpose Storage Area Path
Long-term data for routine access that is unrelated to a project User Home $HOME
Long-term data for archival access that is unrelated to a project User Archive /home/$USER
Long-term project data for routine access that's shared with other project members Project Home /ccs/proj/[projid]
Short-term project data for fast, batch-job access that you don't want to share Member Work $MEMBERWORK/[projid]
Short-term project data for fast, batch-job access that's shared with other project members Project Work $PROJWORK/[projid]
Short-term project data for fast, batch-job access that's shared with those outside your project World Work $WORLDWORK/[projid]
Long-term project data for archival access that's shared with other project members Project Archive /proj/[projid]
User Home
Home directories for each user are NFS-mounted on all OLCF systems and are intended to store long-term, frequently-accessed user data. User Home areas are backed up on a daily basis. This file system does not generally provide the input/output (I/O) performance required by most compute jobs, and is not available to compute jobs on most systems. See the section “Data Retention, Purge, & Quota Summary” for more details on applicable quotas, backups, purge, and retention timeframes.
User Archive
The High Performance Storage System (HPSS) is the tape-archive storage system at the OLCF and is the storage technology that supports the User Archive areas. HPSS is intended for data that do not require day-to-day access. See the section “Data Retention, Purge, & Quota Summary” for more details on applicable quotas, backups, purge, and retention timeframes.
Project Home
Project Home directories are NFS-mounted on selected OLCF systems and are intended to store long-term, frequently-accessed data that is needed by all collaborating members of a project. Project Home areas are backed up on a daily basis. This file system does not generally provide the input/output (I/O) performance required by most compute jobs, and is not available to compute jobs on most systems. See the section “Data Retention, Purge, & Quota Summary” for more details on applicable quotas, backups, purge, and retention timeframes.
Member Work
Project members get an individual Member Work directory for each associated project; these reside in the center-wide, high-capacity Lustre® file system on large, fast disk areas intended for global (parallel) access to temporary/scratch storage. Member Work directories are provided commonly across all systems. Because of the scratch nature of the file system, it is not backed up and files are automatically purged on a regular basis. Files should not be retained in this file system for long, but rather should be migrated to Project Home or Project Archive space as soon as the files are not actively being used. If a file system associated with your Member Work directory is nearing capacity, the OLCF may contact you to request that you reduce the size of your Member Work directory. See the section “Data Retention, Purge, & Quota Summary” for more details on applicable quotas, backups, purge, and retention timeframes.
Project Work
Individual Project Work directories reside in the center-wide, high-capacity Lustre file system on large, fast disk areas intended for global (parallel) access to temporary/scratch storage. Project Work directories are provided commonly across most systems. Because of the scratch nature of the file system, it is not backed up. If a file system associated with Project Work storage is nearing capacity, the OLCF may contact the PI of the project to request that he or she reduce the size of the Project Work directory. See the section “Data Retention, Purge, & Quota Summary” for more details on applicable quotas, backups, purge, and retention timeframes.
World Work
Each project has a World Work directory that resides in the center-wide, high-capacity Lustre file system on large, fast disk areas intended for global (parallel) access to temporary/scratch storage. World Work directories are provided commonly across most systems. Because of the scratch nature of the file system, it is not backed up. If a file system associated with World Work storage is nearing capacity, the OLCF may contact the PI of the project to request that he or she reduce the size of the World Work directory. See the section “Data Retention, Purge, & Quota Summary” for more details on applicable quotas, backups, purge, and retention timeframes.
Project Archive
The High Performance Storage System (HPSS) is the tape-archive storage system at the OLCF and is the storage technology that supports the User Archive areas. HPSS is intended for data that do not require day-to-day access. Project Archive areas are shared between all users of the project. Users should not store data unrelated to OLCF projects on HPSS. Project members should also periodically review files and remove unneeded ones. See the section “Data Retention, Purge, & Quota Summary” for more details on applicable quotas, backups, purge, and retention timeframes.
Local Scratch Storage
A large, fast disk area intended for parallel access to temporary storage in the form of scratch directories may be provided on a limited number of systems. This area is local to a specific system. This directory is, for example, intended to hold output generated by a user's job. Because of the scratch nature of the file system, it is not backed up and files are automatically purged on a regular basis. Files should not be retained in this file system and should be migrated to archival storage as soon as the files are not actively being used. Quotas may be instituted on a machine-by-machine basis if deemed necessary.
Data Retention, Purge, & Quotas
Summary
The following table details quota, backup, purge, and retention information for each user-centric and project-centric storage area available at the OLCF.
User-Centric Storage Areas
Area Path Type Permissions Quota Backups Purged Retention
User Home $HOME NFS User-controlled 10 GB Yes No 90 days
User Archive /home/$USER HPSS User-controlled 2 TB [1] No No 90 days
Project-Centric Storage Areas
Area Path Type Permissions Quota Backups Purged Retention
Project Home /ccs/proj/[projid] NFS 770 50 GB Yes No 90 days
Member Work $MEMBERWORK/[projid] Lustre® 700 [2] 10 TB No 14 days     [4]
Project Work $PROJWORK/[projid] Lustre® 770 100 TB No 90 days     [4]
World Work $WORLDWORK/[projid] Lustre® 775 10 TB No 90 days     [4]
Project Archive /proj/[projid] HPSS 770 100 TB [3] No No 90 days
Area The general name of storage area.
Path The path (symlink) to the storage area's directory.
Type The underlying software technology supporting the storage area.
Permissions UNIX Permissions enforced on the storage area's top-level directory.
Quota The limits placed on total number of bytes and/or files in the storage area.
Backups States if the data is automatically duplicated for disaster recovery purposes.
Purged Period of time, post-file-access, after which a file will be marked as eligible for permanent deletion.
Retention Period of time, post-account-deactivation or post-project-end, after which data will be marked as eligible for permanent deletion.
Important! Files within "Work" directories (i.e., Member Work, Project Work, World Work) are not backed up and are purged on a regular basis according to the timeframes listed above.

[1] In addition, there is a quota/limit of 2,000 files on this directory.

[2] Permissions on Member Work directories can be controlled to an extent by project members. By default, only the project member has any accesses, but accesses can be granted to other project members by setting group permissions accordingly on the Member Work directory. The parent directory of the Member Work directory prevents accesses by "UNIX-others" and cannot be changed (security measures).

[3] In addition, there is a quota/limit of 100,000 files on this directory.

[4] Retention is not applicable as files will follow purge cycle.

Data Retention Overview
By default, there is no lifetime retention for any data on OLCF resources. The OLCF specifies a limited post-deactivation timeframe during which user and project data will be retained. When the retention timeframe expires, the OLCF retains the right to delete data. If you have data retention needs outside of the default policy, please notify the OLCF.
User Data Retention
The user data retention policy exists to reclaim storage space after a user account is deactivated, e.g., after the user’s involvement on all OLCF projects concludes. By default, the OLCF will retain data in user-centric storage areas only for a designated amount of time after the user’s account is deactivated. During this time, a user can request a temporary user account extension for data access. See the section “Data Retention, Purge, & Quota Summary” for details on retention timeframes for each user-centric storage area.
Project Data Retention
The project data retention policy exists to reclaim storage space after a project ends. By default, the OLCF will retain data in project-centric storage areas only for a designated amount of time after the project end date. During this time, a project member can request a temporary user account extension for data access. See the section “Data Retention, Purge, & Quota Summary” for details on purge and retention timeframes for each project-centric storage area.
Sensitive Project Data Retention
For sensitive projects only, all data related to the project must be purged from all OLCF computing resources within 30 days of the project’s end or termination date.
Data Purges
Data purge mechanisms are enabled on some OLCF file system directories in order to maintain sufficient disk space availability for job execution. Files in these scratch areas are automatically purged on a regular purge timeframe. If a file system with an active purge policy is nearing capacity, the OLCF may contact you to request that you reduce the size of a directory within that file system, even if the purge timeframe has not been exceeded. See the section “Data Retention, Purge, & Quota Summary” for details on purge timeframes for each storage area, if applicable.
Storage Space Quotas
Each user-centric and project-centric storage area has an associated quota, which could be a hard (systematically-enforceable) quota or a soft (policy-enforceable) quota. Storage usage will be monitored continually. When a user or project exceeds a soft quota for a storage area, the user or project PI will be contacted and will be asked if at all possible to purge data from the offending area. See the section “Data Retention, Purge, & Quota Summary” for details on quotas for each storage area.
Data Prohibitions & Safeguards
Prohibited Data
The OLCF computer systems are operated as research systems and only contain data related to scientific research and do not contain personally identifiable information (data that falls under the Privacy Act of 1974 5U.S.C. 552a). Use of OLCF resources to store, manipulate, or remotely access any national security information is strictly prohibited. This includes, but is not limited to: classified information, unclassified controlled nuclear information (UCNI), naval nuclear propulsion information (NNPI), the design or development of nuclear, biological, or chemical weapons or any weapons of mass destruction. Authors/generators/owners of information are responsible for its correct categorization as sensitive or non-sensitive. Owners of sensitive information are responsible for its secure handling, transmission, processing, storage, and disposal on OLCF systems. Principal investigators, users, or project delegates that use OLCF resources, or are responsible for overseeing projects that use OLCF resources, are strictly responsible for knowing whether their project generates any of these prohibited data types or information that falls under Export Control. For questions, contact help@olcf.ornl.gov.
Unauthorized Data Modification
Users are prohibited from taking unauthorized actions to intentionally modify or delete information or programs.
Data Confidentiality, Integrity, & Availability
The OLCF systems provide protections to maintain the confidentiality, integrity, and availability of user data. Measures include: the availability of file permissions, archival systems with access control lists, and parity/CRC checks on data paths/files. It is the user’s responsibility to set access controls appropriately for data. In the event of system failure or malicious actions, the OLCF makes no guarantee against loss of data nor makes a guarantee that a user’s data could not be potentially accessed, changed, or deleted by another individual. It is the user’s responsibility to insure the appropriate level of backup and integrity checks on critical data and programs.
Administrator Access to Data
OLCF resources are federal computer systems, and as such, users should have no explicit or implicit expectation of privacy. OLCF employees and authorized vendor personnel with “root” privileges have access to all data on OLCF systems. Such employees can also login to OLCF systems as other users. As a general rule, OLCF employees will not discuss your data with any unauthorized entities nor grant access to data files to any person other than the UNIX “owner” of the data file, except in the following situations:
  • When the owner of the data requests a change of ownership for any reason, e.g., the owner is leaving the project and grants the PI ownership of the data.
  • In situations of suspected abuse/misuse computational resources, criminal activity, or cyber-security violations.
Note that the above applies even to project PIs. In general, the OLCF will not overwrite existing UNIX permissions on data files owned by project members for the purpose of granting access to the project PI. Project PIs should work closely with project members throughout the duration of the project to ensure UNIX permissions are set appropriately.
Software
Software Licensing
All software used on OLCF computers must be appropriately acquired and used according to the appropriate software license agreement. Possession, use, or transmission of illegally obtained software is prohibited. Likewise, users shall not copy, store, or transfer copyrighted software, except as permitted by the owner of the copyright. Only export-controlled codes approved by the Export Control Office may be run by parties with sensitive data agreements.
Malicious Software
Users must not intentionally introduce or use malicious software, including but not limited to, computer viruses, Trojan horses, or computer worms.
Reconstruction of Information or Software
Users are not permitted to reconstruct information or software for which they are not authorized. This includes but is not limited to any reverse engineering of copyrighted software or firmware present on OLCF computing resources.


3. Cyber Security Policy

(Back to Top)

Note: This details an official policy of the OLCF, and must be agreed to by the following persons as a condition of access to or use of OLCF computational resources:
  • Principal Investigators (Non-Profit)
  • Principal Investigators (Industry)
  • All Users
Title: Cyber Security Policy Version: 12.10
The Oak Ridge Leadership Computing Facility (OLCF) computing resources are provided to users for research purposes. All users must agree to abide by all security measures described in this document. Failure to comply with security procedures will result in termination of access to OLCF computing resources and possible legal actions.
Scope
The requirements outlined in this document apply to all individuals who have an OLCF account. It is your responsibility to ensure that all individuals have the proper need-to-know before allowing them access to the information on OLCF computing resources. This document will outline the main security concerns. Specific use policies are covered in the OLCF Computing Policy.
Personal Use
OLCF computing resources are for business use only. Installation or use of software for personal use is not allowed. Incidents of abuse will result in account termination. Inappropriate uses include, but are not limited to:
  • Sexually oriented information
  • Downloading, copying, or distributing copyrighted materials without prior permission from the owner
  • Downloading or storing large files or utilizing streaming media for personal use (e.g., music files, graphic files, internet radio, video streams, etc.)
  • Advertising, soliciting, or selling
Accessing OLCF Computational Resources
Access to systems is provided via Secure Shell version 2 (sshv2). You will need to ensure that your ssh client supports keyboard-interactive authentication. The method of setting up this authentication varies from client to client, so you may need to contact your local administrator for assistance. Most new implementations support this authentication type, and many ssh clients are available on the web. Login sessions will be automatically terminated after a period of inactivity. When you apply for an account, you will be mailed an RSA SecurID token. You will also be sent a request to complete identity verification. When your account is approved, your RSA SecurID token will also be enabled. Please refer to Authenticating to OLCF Systems for more information setting your PIN and logging in; refer to OLCF System Hostnames for more information on host access specifics. DO NOT share your PIN or RSA SecurID token with anyone. Sharing of accounts will result in termination. If your SecurID token is stolen or misplaced, contact the OLCF immediately and report the missing token. Upon termination of your account access, return the token to the OLCF in person or via mail.
Data Management
The OLCF uses a standard file system structure to assist users with data organization on OLCF systems. Complete details about all file systems available to OLCF users can be found in the OLCF Data Management Policy.
Sensitive Data
Additional file systems and file protections may be employed for sensitive data. If you are a user on a project producing sensitive data, further instructions will be given by the OLCF. The following guidelines apply to sensitive data:
  • Only store sensitive data in designated locations. Do not store sensitive data in your User Home directory.
  • Never allow access to your sensitive data to anyone outside of your group.
  • Transfer of sensitive data must be through the use encrypted methods (scp, sftp, etc).
  • All sensitive data must be removed from all OLCF resources when your project has concluded.
Data Transfer
The OLCF offers two dedicated data transfer nodes to users. The nodes have been tuned specifically for wide area data transfers, and also perform well on the local area. There are also several utilities that the OLCF recommends for data transfer. Please refer to our article on Employing Data Transfer Nodes for information about the DTNs and available utilities.


4. Titan Scheduling Policy

(Back to Top)

Note: This details an official policy of the OLCF, and must be agreed to by the following persons as a condition of access to or use of OLCF computational resources:
  • Principal Investigators (Non-Profit)
  • Principal Investigators (Industry)
  • All Users
Title: Titan Scheduling Policy Version: 13.02
In a simple batch queue system, jobs run in a first-in, first-out (FIFO) order. This often does not make effective use of the system. A large job may be next in line to run. If the system is using a strict FIFO queue, many processors sit idle while the large job waits to run. Backfilling would allow smaller, shorter jobs to use those otherwise idle resources, and with the proper algorithm, the start time of the large job would not be delayed. While this does make more effective use of the system, it indirectly encourages the submission of smaller jobs.
The DOE Leadership-Class Job Mandate
As a DOE Leadership Computing Facility, the OLCF has a mandate that a large portion of Titan's usage come from large, leadership-class (aka capability) jobs. To ensure the OLCF complies with DOE directives, we strongly encourage users to run jobs on Titan that are as large as their code will warrant. To that end, the OLCF implements queue policies that enable large jobs to run in a timely fashion.
Note: The OLCF implements queue policies that encourage the submission and timely execution of large, leadership-class jobs on Titan.
The basic priority-setting mechanism for jobs waiting in the queue is the time a job has been waiting relative to other jobs in the queue. However, several factors are applied by the batch system to modify the apparent time a job has been waiting. These factors include:
  • The number of nodes requested by the job.
  • The queue to which the job is submitted.
  • The 8-week history of usage for the project associated with the job.
  • The 8-week history of usage for the user associated with the job.
If your jobs require resources outside these queue policies, please complete the relevant request form on the Special Requests page. If you have any questions or comments on the queue policies below, please direct them to the User Assistance Center.
Job Priority by Processor Count
Jobs are aged according to the job's requested processor count (older age equals higher queue priority). Each job's requested processor count places it into a specific bin. Each bin has a different aging parameter, which all jobs in the bin receive.
Bin Min Nodes Max Nodes Max Walltime (Hours) Aging Boost (Days)
1 11,250 -- 24.0 15
2 3,750 11,249 24.0 5
3 313 3,749 12.0 0
4 126 312 6.0 0
5 1 125 2.0 0
FairShare Scheduling Policy
FairShare, as its name suggests, tries to push each user and project towards their fair share of the system's utilization: in this case, 5% of the system's utilization per user and 10% of the system's utilization per project. To do this, the job scheduler adds (30) minutes priority aging per user and (1) hour of priority aging per project for every (1) percent the user or project is under its fair share value for the prior (8) weeks. Similarly, the job scheduler subtracts priority in the same way for users or projects that are over their fair share. For instance, a user who has personally used 0.0% of the system's utilization over the past (8) weeks who is on a project that has also used 0.0% of the system's utilization will get a (12.5) hour bonus (5 * 30 min for the user + 10 * 1 hour for the project). In contrast, a user who has personally used 0.0% of the system's utilization on a project that has used 12.5% of the system's utilization would get no bonus (5 * 30 min for the user - 2.5 * 1 hour for the project).
batch Queue Policy
The batch queue is the default queue for production work on Titan. Most work on Titan is handled through this queue. It enforces the following policies:
  • Limit of (4) eligible-to-run jobs per user.
  • Jobs in excess of the per user limit above will be placed into a held state, but will change to eligible-to-run at the appropriate time.
  • Users may have only (2) jobs in bin 5 running at any time. Any additional jobs will be blocked until one of the running jobs completes.
Note: The eligible-to-run state is not the running state. Eligible-to-run jobs have not started and are waiting for resources. Running jobs are actually executing.
killable Queue Policy
At the start of a scheduled system outage, a queue reservation is used to ensure that no jobs are running. In the batch queue, the scheduler will not start a job if it expects that the job would not complete (based on the job's user-specified max walltime) before the reservation's start time. In constrast, the killable queue allows the scheduler to start a job even if it will not complete before a scheduled reservation. It enforces the following policies:
  • Jobs will be killed if still running when a system outage begins.
  • The scheduler will stop scheduling jobs in the killable queue (1) hour before a scheduled outage.
  • Maximum-job-per-user limits are the same (i.e., in conjunction with) the batch queue.
  • Any killed jobs will be automatically re-queued after a system outage completes.
debug Queue Policy
The debug queue is intended to provide faster turnaround times for the code development, testing, and debugging cycle. For example, interactive parallel work is an ideal use for the debug queue. It enforces the following policies:
  • Production jobs are not allowed.
  • Maximum job walltime of (1) hour.
  • Limit of (1) job per user regardless of the job's state.
  • Jobs receive a (2)-day priority aging boost for scheduling.
Warning: Users who misuse the debug queue may have further access to the queue denied.
Allocation Overuse Policy
Projects that overrun their allocation are still allowed to run on OLCF systems, although at a reduced priority. Like the adjustment for the number of processors requested above, this is an adjustment to the apparent submit time of the job. However, this adjustment has the effect of making jobs appear much younger than jobs submitted under projects that have not exceeded their allocation. In addition to the priority change, these jobs are also limited in the amount of wall time that can be used. For example, consider that job1 is submitted at the same time as job2. The project associated with job1 is over its allocation, while the project for job2 is not. The batch system will consider job2 to have been waiting for a longer time than job1. Also projects that are at 125% of their allocated time will be limited to only one running job at a time. The adjustment to the apparent submit time depends upon the percentage that the project is over its allocation, as shown in the table below:
% Of Allocation Used Priority Reduction number eligible-to-run number running
< 100% 0 days 4 jobs unlimited jobs
100% to 125% 30 days 4 jobs unlimited jobs
> 125% 365 days 4 jobs 1 job
System Reservation Policy
Projects may request to reserve a set of processors for a period of time through the reservation request form, which can be found on the Special Requests page. If the reservation is granted, the reserved processors will be blocked from general use for a given period of time. Only users that have been authorized to use the reservation can utilize those resources. Since no other users can access the reserved resources, it is crucial that groups given reservations take care to ensure the utilization on those resources remains high. To prevent reserved resources from remaining idle for an extended period of time, reservations are monitored for inactivity. If activity falls below 50% of the reserved resources for more than (30) minutes, the reservation will be canceled and the system will be returned to normal scheduling. A new reservation must be requested if this occurs. Since a reservation makes resources unavailable to the general user population, projects that are granted reservations will be charged (regardless of their actual utilization) a CPU-time equivalent to (# of cores reserved) * (length of reservation in hours).


5. INCITE Allocation Under-utilization Policy

(Back to Top)

Note: This details an official policy of the OLCF, and must be agreed to by the following persons as a condition of access to and use of OLCF computational resources:
  • INCITE Principal Investigators
Title: INCITE Allocation Under-utilization Policy Version: 12.10
The OLCF has a pull-back policy for under-utilization of INCITE allocations. Under-utilized INCITE project allocations will have core-hours removed from their outstanding core-hour project balance at specific times during the INCITE calendar year. The following table summarizes the current under-utilization policy:
Date Utilization to-Date Forfeited Amount
May 1 < 10% Up to 30% of remaining allocation
< 15% Up to 15% of remaining allocation
September 1 < 10% Up to 75% of remaining allocation
< 33% Up to 50% of remaining allocation
< 50% Up to 33% of remaining allocation
For example, a 1,000,000 core-hour INCITE project that has utilized only 50,000 core-hours (5% of the allocation) on May 1st would forfeit (0.30 * 950,000) = 285,000 core-hours from their remaining allocation.


6. Project Reporting Policy

(Back to Top)

Note: This details an official policy of the OLCF, and must be agreed to by the following persons as a condition of access to and use of OLCF computational resources:
  • Principal Investigators (Non-Profit)
  • Principal Investigators (Industry)
Title: Project Reporting Policy Version: 12.10
Principal Investigators of current OLCF projects must submit a quarterly progress report. The quarterly reports are essential as the OLCF must diligently track the use of the center's resources. In keeping with this, the OLCF (and DOE Leadership Computing Facilities in general) imposes the following penalties for late submission:
Timeframe Penalty
1 Month Late Job submissions against offending project will be suspended.
3 Months Late Login privileges will be suspended for all OLCF resources for all users associated with offending project.