Undergraduate researcher helps develop data-encoding tools for future supercomputers

When Jess Woods enrolled at the University of North Carolina at Chapel Hill (UNC), he was sure he’d pursue a degree in studio art. He loved oil painting, and it seemed like the right career path. Then he took a math class.

Now, 4 years later and armed with a bachelor’s degree in computer science, Woods is immersed in a project to optimize the encoding and decoding of data for future generations of supercomputers.

How did abstract algebra become such an unexpectedly important part of this 22-year-old’s life? It may have begun with his discovery at UNC of how much he enjoyed math, but it’s found full flower at the US Department of Energy’s (DOE’s) Oak Ridge National Laboratory (ORNL). As part of the DOE Science Undergraduate Laboratory Internships (SULI) program, Woods has been tasked at ORNL with exploring different ways of representing data and enabling operations to be performed on that transformed data.

“As we move to zettascale computing [1,000 exaflops], the way we store and move data around is getting harder and harder. We’re kind of pushing the limits of our physics right now with how fast we can go while staying within a power envelope,” Woods said. “So you can compress data and put it in a different format, and you can encode data to make it more secure. Wouldn’t it be awesome if you could do operations on data encodings without having to decode it first?”

According to Oscar Hernandez, PhD, a tools developer in the lab’s Computer Science Research Group who serves as Woods’s mentor on the project, future high-performance computing systems will require new approaches to how data is represented to optimize memory, speed, security, and power usage. The use of parallel computing methods—such as those enabled by the Oak Ridge Leadership Computing Facility’s IBM AC922 Summit supercomputer and its NVIDIA Tensor Core GPUs—will be integral to speeding up the team’s algorithms behind encoding and decoding data. But although these forthcoming supercomputers may use more cores, it’s likely they will have less memory per core, he said.

“Improving the way we’re utilizing the memory in those future systems is important,” Hernandez said. “Jess is going through the background for the mathematics to do that and exploring what will be the parallelization strategy we need to follow. At the same time, he’s exploring programming models—which languages to use to do these implementations and to do them efficiently.”

One month into the project, Woods has coded a working implementation of this scheme in Python, with C++, CUDA, and other programming languages to follow. He’s started with some very simple operations—addition and multiplication—that can be applied to functions such as database searchs or comparisons.

“At the end of the day, it will be interesting to have a library that we can use to encode data and perform some basic operations that we use for scientific computing and then decode the information back,” Hernandez said. “The idea is to show the cost of doing that can be offset by the size of the dataset we’re generating.”

Meanwhile, Woods’s SULI internship and experiences at ORNL have encouraged him to pursue a graduate degree in computer science and possibly a career at a national lab. Being entrusted to assist Hernandez with an important project at ORNL has given him a greater sense of confidence in his career choice.

“There’s a lot more independent learning here,” Woods said. “I’ve probably read more textbooks in the time that I’ve been here than in an entire semester of school. I’ve never done research all day every day like this, and I think that’s valuable. I think it will be supremely helpful in grad school.”

From Hernandez’s perspective, the SULI program and Woods’s assistance have been assets.

“They let us explore areas of research that we know are important and doable,” Hernandez said. “In the case of Jess, it has broadened his scope in the context of topics that he’s interested in for doing his research, and it will potentially help him find a career. For us, we know the topics he’s working on—data representations for either compression or security—are big topics we want to explore for systems that will come in 2026 or 2030 time frames. We just need to start looking into it now, and this definitely helps get some ideas started.”

The SULI program is sponsored by the DOE Office of Science’s Office of Workforce Development for Teachers and Scientists and is administered by the Oak Ridge Institute for Science and Education, managed by Oak Ridge Associated Universities. The OLCF is a DOE Office of Science User Facility located at ORNL.

UT-Battelle LLC manages Oak Ridge National Laboratory for DOE’s Office of Science, the single largest supporter of basic research in the physical sciences in the United States. DOE’s Office of Science is working to address some of the most pressing challenges of our time. For more information, visit https://energy.gov/science.