Sudharshan Vazhkudai leads the Technology Integration (TechInt) group in the National Center for Computational Sciences (NCCS), to build and deploy solutions for the nation’s premier supercomputing center, the Oak Ridge Leadership Computing Facility (OLCF). OLCF is home to some of the fastest systems, Titan, Summit, and the Spider storage system, providing billions of core hours to a scientific user base from academia, government and industry, to perform breakthrough research in a variety of scientific domains. As a member of OLCF management, he contributes to the strategic direction of the center, and leads TechInt in several areas such as file and storage systems, non-volatile memory (NVRAM), data management, system architecture, and distributed systems.

Prior to that, he was a Research Scientist in the Computer Science Research group of the Computer Science and Mathematics Division at ORNL, for over nine years. He joined ORNL in August 2003. In this role, his work straddled both basic and applied distributed systems research. He has worked on several projects in the following areas: large-scale data management, HPC I/O, storage systems, non-volatile memory, end-to-end data transfers and multicore architectures.

Sudharshan received a Ph.D. and Masters in Computer Science from the University of Mississippi in 2003 and 1998 respectively. His doctoral work (partly at Argonne National Laboratory , in Chicago, on a Wallace Givens Fellowship) addressed data access issues in the Globus Data Grid environment. He obtained a bachelors degree in computer science from Karnatak University in India in 1996.


R&D Activities Contributions

HPSS Development - High Performance Storage System (HPSS) is the result of over two decades of collaboration among five Department of Energy laboratories and IBM, with significant contributions…

Spider Storage System - The Spider Lustre-based Parallel File System Development and Deployment: The OLCF has deployed multiple large-scale parallel file systems (PFS) to support its operations. During this…

Improving Parallel IO Efficiency & Usability - Spider PFS Metadata Snapshot Capture and Analysis: (i) LustreDU: The development a scalable tool to capture daily snapshots of the 1-Billion entry Spider PFS’s metadata.…

Exploiting Node-Local, Non-Volatile Memory (NVM) - Spectral is a transparently applied library for taking advantage of the Summit Burst Buffer architecture. Applications using per-process output simply write to the node-local burst…

Reliability and Resiliency - The project involves analysis of the reliability characteristics of Titan’s 299,008 CPUs and 18,688 GPUs to understand trends in machine failure, MTBF, single bit errors,…

HPC Systems Scheduling Improvements - Resource selection can have profound impacts on the performance and reliability of applications running on the supercomputer. On Titan, there are on going efforts to…

Grand Unified Information Directory Environment (GUIDE) - GUIDE is a framework used to collect, federate, and analyze log data from OLCF, and to derive insights into facility operations based on the log…

Constellation DOI Framework and Portal - Constellation is a digital object identifier (DOI) based science network for supercomputing data. Constellation makes it possible for OLCF researchers to obtain DOIs for large…

Data Jockey - Data Jockey is a workflow aware data management service that helps users automate the orchestration of data movement and placement across multiple storage tiers in…


2017 — Significant Event Award – ORNL

2015 — Significant Event Award – ORNL

2012 — Significant Event Award – ORNL

2008 — Outstanding Mentor Award – Oak Ridge Institute for Science and Education

2003 — Dissertation Fellowship – The University of Mississippi

2001 — Doctoral Fellowship – Argonne National Laboratory

2000 — Wallace Givens Fellowship – Argonne National Laboratory