Skip to main content

ORNL’s Oak Ridge Leadership Computing Facility (OLCF) is introducing the AMD Lux AI+HPC supercomputer to enable its users to dramatically accelerate AI- driven research. OLCF brings a rich history of deploying and operating leadership supercomputers since 2004 including the first supercomputer to use GPUs at scale (Titan in 2012), the first to use HBM and NVLINK (Summit in 2018), and the first to achieve 1 ExaFLOP in FP64 performance (Frontier in 2022).

In a unique public-private partnership model, AMD is co-investing with DOE to deliver Lux. AMD brings world-leading X86 CPU performance and unmatched GPU memory capacity and bandwidth, offering excellent AI and HPC (modeling and simulation) performance.

The AMD Lux AI Cluster, will be deployed at ORNL within the next six months to expand DOE’s near-term AI capacity and accelerate progress on critical problems. Credit: AMD and ORNL

Built for AI and HPC

Lux will have AMD’s MI355X GPUs with 288 GB of HBM3E with 8 TB/s of bandwidth. The MI355X provides 5 PF of AI training (FP8) performance and 78 TF of HPC (modeling and simulation using FP64) performance.

Lux will have both Slurm and Kubernetes support for resource scheduling. The size of the partitions will depend on demand and will adjust over time.

Storage is Included

While each node has over 24 TB of NVMe SSDs for high performance reading and writing, Lux allocations also will include access to OLCF’s Lustre file system, Orion, with over 600 PB of capacity. And Lux partners with Frontier allocations can create data sets written from Frontier to Orion and then train on those data sets on Lux.

DOE Moderate Security Controls

Lux will use DOE Moderate controls, the same level of security controls as OLCF’s leadership system, Frontier. DOE Moderate controls allow for export-controlled applications and data. OLCF plans to add support for DOE Moderate- Enhanced controls. When available, this would also enable support for International Traffic in Arms Regulations (ITAR) and Protected Health Information (PHI) applications and data.

Flexible Allocation Dates

Half of Lux’s annual 3.5 million node-hours are dedicated to serving the Genesis Mission to accelerate scientific discovery. Lux is not tied to DOE’s INCITE, ALCC, or OLCF Director’s Discretionary (DD) allocation programs. A new Lux partner’s allocation can start on any month and run from six months to five years.

Unique Partnering Opportunity for Select OLCF Users

As with previous OLCF supercomputers, OLCF will operate Lux on behalf of DOE. Half of Lux’s annual 3.5 million node-hours are dedicated to serving the Genesis Mission to accelerate scientific discovery. The other half are reserved for paid access by public and private sector partners. AMD will use these cycles for development and testing. In addition, AMD will make a portion of the reserved time available for new partners, public or private, and work with them to get the most performance for their AI and HPC workloads.

Lux partners purchase a node-hours amount and can consume these node-hours at any scale from single-node jobs to full system jobs[1]. No other system offers access to large-scale GPUs without requiring a long-term, reserved commitment.

A minimum 175K node-hours per six-month commitment is required. This would allow, for example, 400-node (3,200 GPUs) jobs for eighteen days (spread over six months) or approximately 40 nodes of continuous use for six months.

Interested partners can contact [email protected] to get started. OLCF representatives will discuss the application process and help you begin the process.

Discovery System Specifications

4,000+ GPUs
MI355X GPUs in 500+ nodes
1.1 PB HBM
To solve large problems
288 GB/GPU
High memory per GPU for better scaling
40 ExaFLOPS
For AI inference using FP4
20 ExaFLOPS
For AI training using FP8
317 PetaFLOPS
For HPC (modeling and simulation) using FP64
Ultra Ethernet
400G NIC per GPU
AI + HPCSlurm and Kubernetes; Robust user environment supporting AI and HPC workloads

Latest Lux Highlights

Filter

ORNL, AMD, and HPE to Deliver DOE’s Newest AI Supercomputers: Discovery and Lux

The U.S. Department of Energy announced today its newest supercomputers, Discovery and Lux, at Oak Ridge National Laboratory that will expand America’s leadership in artificial intelligence for scientific computing, strengthen national security, and drive the…
Katie BetheaKatie BetheaOctober 27, 20256 min