OLCF-6 Benchmark Run Rules
Note: The following run rules are intentionally similar to and adapted from NERSC-10 Benchmarks Run Rules. There are however some differences that should be noted.
Application Benchmark Descriptions
The application benchmarks are a representation of the OLCF workload and were chosen to span a variety of algorithmic and scientific spaces. Each benchmark distribution contains a README file that provides links to the benchmark source code distribution for reference, instructions for compiling, executing, verifying numerical correctness, and reporting results for each benchmark. Multiple input problems and sample output from existing OLCF systems (i.e. Summit and/or Frontier) are included to facilitate profiling at reduced scale. The README files specify a target problem size which must be used to report benchmark performance.
Allowed Modifications
Two tiers of modification—ported and optimized—are permitted for the benchmarks. The purpose of the tiering is to understand the level of developer efforts needed to achieve high performance on the target architecture(s). Besides the rules for each tier, each benchmark may provide additional benchmark-specific rules or amendments that supersede the rules described here. In all cases, benchmark performance will be accepted only from runs that exhibit correct execution.
Ported results are intended to reflect out-of-the box performance with minimal developer effort needed to run the benchmark on the target system. Limited source code modification is permitted (elaborated below) and must be described. Compiler options, library substitutions and concurrency may also be modified as follows.
- Only publicly available and documented compiler flags shall be used.
- Library substitutions are permitted. Proprietary library routines may be used as long as they currently exist in a supported set of general or scientific libraries, and must be in such a set when the system is delivered. Publicly available or open source libraries may be used if they can be built and used by the installed compilation system. The libraries must not specialize or limit the applicability of the benchmark, nor violate the measurement goals of the particular benchmark.
- Parallel constructs substitutions (e.g. replacing for-loop with Kokkos parallel_for or with other library calls) are not considered library substitutions and are permitted only in the optimized category.
- Concurrency (e.g node-type, node-count, process-count and accelerator-count) may be modified to produce the best results on the target system. The rationale for these choices must be explained. Note that the number of MPI tasks that can be used for a particular benchmark may be constrained by any domain decomposition rules inherent in the code, as described in the benchmark’s README file.
- Source code modifications for correct execution permitted after a discussion with OLCF. A change will only be accepted if the Offeror shows that the original source has a software bug.
- Batch scripts may be modified in order to execute on the target system. The script may include optimizations appropriate for the original executable, e.g. setting environment variables and specifying storage system file placement.
- Replacement of existing architecture-specific language constructs (examples: CUDA, HIP, DPC++) with another well documented language or interface is permitted. This may also include API and library substitutions that necessitate limited, well-scoped changes in the source code.
- Addition or modification of directives or pragmas is permitted.
Optimized results are intended to showcase system capabilities that are not realized by the baseline code. Aggressive code changes that enhance performance are permitted as long as the full capabilities of the code are maintained, the code can still pass validation tests, and the underlying purpose of the benchmark is not compromised. Changes to the source code may be made so long as the following conditions are met:
- The rationale for and relative effect on performance of any optimization are described;
- Algorithms fundamental to the program are not replaced (since replacing algorithms may result in violations of correctness or program requirements or other chosen software decisions);
- All simulation parameters such as grid size, number of particles, etc., must not be changed;
- The optimized code execution must still result in correct numerical results; any code optimizations must be made available to the general user community, either through a system library or a well-documented explanation of code improvements.
For the optimized tier, the Offeror is strongly encouraged to optimize the source code in a variety of ways, including, but not limited to:
- Aggressive code changes that enhance performance are permitted. Performance improvements from pragma-style guidance in C, C++, and Fortran source files are preferred. Wholesale algorithm changes or manual rewriting of loops that become strongly architecture specific are of less value.
- Newer versions of the benchmark the source code obtained from the upstream repository and branch may be used without providing additional rationale or analysis. The source code revision ID (commit hash) must be provided.
- Source preprocessors, execution profile feedback optimizers, etc., are allowed as long as they are, or will be, available and supported as part of the compilation system for the delivered systems.
- Optimizations that accelerate data-movement between stages of a workflow are permitted.
- Specialized code to optimize specific hardware accelerators is permitted.
If multiple code optimizations are implemented, the Offeror is encouraged to provide the performance improvement from each optimization at a granularity that enables OLCF reviewers to understand the relative importance of each optimization as well as potential transferability to other codes in the OLCF workload.