Staff Section Head - Kevin Thach
The Systems Section supports the NCCS’s computing, networking, and storage systems, including general support of critical computational and facilities-related infrastructure and systems. They administer and support high-speed parallel file systems and archive capabilities, and develop tools and administer data management platforms to ensure security, operational, and laboratory policy compliance.
The HPC Clusters Group administers and supports the division’s large-scale cluster computing infrastructure, which includes system installation, deployment, acceptance, performance testing, upgrades, problem diagnosis, and troubleshooting.
|HPC Cybersecurity & Information Engineering||
The HPC Cybersecurity & Information Engineering Group develops tools and administers data management platforms to extract and analyze telemetry, event logs, and system state information to ensure security and laboratory policy compliance.
|HPC Infrastructure & Networking||
The HPC Infrastructure & Networking Group administers and supports networking capabilities that support the overall mission of leadership-class and scalable computing programs.
|HPC Infrastructure Operations||
The HPC Infrastructure Operations Group provides continuous monitoring, issue triaging and escalation, and general support of critical computational and facilities-related infrastructure.
|HPC Scalable Systems||
The HPC Scalable Systems Group administers and supports system installation, deployment, acceptance, performance testing, upgrades, problem diagnosis, and troubleshooting of HPC computational resources. This group has deployed several Top500 #1 systems over the past decade including Jaguar, Titan, and Summit and will now deploy Frontier.
|HPC Storage & Archive||
The HPC Storage and Archive group is responsible for the high-performance scratch and archival storage systems across the various programs in NCCS. Our work entails the entire lifecycle of high-performance storage systems: requirements gathering, design, procurement, install, acceptance, data life cycle, system upkeep, on call, and decommissioning. We collaborate externally from ORNL with other DOE labs, vendors, and industrial partners to anticipate and solve future storage demands. We also collaborate internally to ORNL to help evaluate new storage platforms for production feasibility and to operate the TechInt testbed.