The Advanced Data and Workflow group interns are finding solutions for large-scale, data-intensive problems
Through opportunities offered by the lab and Oak Ridge Associated Universities, these interns are given a unique educational and professional opportunity that introduces them to the arena of “Big Science.”
Of the 1,018 interns for the summer of 2019, 35 are working in the National Center for Computational Sciences (NCCS) and the Oak Ridge Leadership Computing Facility (OLCF), a DOE Office of Science User Facility located at ORNL, where they are learning how to conduct research in a high-performance computing (HPC) environment. Seven interns in the NCCS’s Advanced Data and Workflow group shared their experiences so far at ORNL.
Reid “Vinny” Paris is completing his third internship with ORNL this summer before returning to Iowa State University to pursue a PhD in statistics. Under his mentor, George Ostrouchov, Paris works to bring classical statistical programming ideas to supercomputer scale.
During his internship, Paris has worked with EnergyPlus, DOE’s open-source whole-building energy modeling (BEM) engine. The program, which has been under development since 1997, simulates buildings based on an array of characteristics to model energy consumption. The sheer number of potential variables in these models tends to slow down the simulations, which is where Paris’s role comes in. By grouping variables with similar behaviors into larger categories, simulations can be greatly sped up.
“This summer my programming skills have improved dramatically. Working with batch scripting, sending things to supercomputers, virtual machines, even remote logins, all that is basically new, and I’ve had to learn as I go. And, even though grouping experiments is technically part of my subfield, I’ve never dealt with them. I’ve kind of had to read papers and communicate some with my mentor and make it up as I go.”
Shuto Araki has been working under his mentor in the Advanced Data and Workflow group, Junqi Yin, to deploy deep learning frameworks that help Summit fully use resources and run artificial intelligence (AI) programs quickly. For Araki, this means examining the way that Summit’s 4,608 nodes share data and communicate with each other.
Araki says his internship at ORNL has introduced him to the more applicable elements of deep learning and AI.
“I’ve learned so much about the practical aspects of deep learning,” Araki said. “There are so many things you can do; deep learning is everywhere, and the potential is growing every day. Self-driving cars, Google Translate, all of these are achieved the same way. That’s fascinating to me.”
This fall, Araki will begin his senior year at DePauw University in Indiana where he is pursuing degrees in computer science and mathematics, as well as a minor in economics.
Tyesha Ruffin will receive her bachelor’s degree from the University of West Alabama after her internship with ORNL and will then pursue her master’s in computational science engineering at North Carolina A&T State University.
This is Ruffin’s second summer interning at ORNL, and her current project finds her improving data estimation methods for the GOES 13 Satellite. Launched in 2010, the satellite is part of a collaborative project between NASA and NOAA that seeks to spot and predict potentially life-threatening weather in and around the United States.
“Last summer I wasn’t as familiar with data science engineering,” she said, “so I was basically learning how to perform the preliminary work, and then this summer I added on to that and started doing calibration and remapping. It’s nice to now be putting things into practice rather than being stuck in the preliminary part. I like actually getting to the gist of what I’m supposed to be doing, and it’s really made me more interested in data science engineering.”
Whitney Nelson is a second-year master’s student in human-computer interaction at Georgia Tech. Her work with ORNL attempts to address the unique challenges that arise through virtual reality–based collaborative visualizations for HPC.
At ORNL, she is learning to design user-focused software that takes into account obstacles like geographic distribution, large-scale data transfers, and multidisciplinary partnerships.
“I decided to go into human-computer interaction because I saw that a lot of people were designing or creating software and completely ignoring what a user would actually want or could actually work with,” she said. “It’s been an issue historically in technology. People will design a really cool app, but then no one wants to use it because it’s not user friendly.”
“I think ORNL teaches you how to work independently and how to answer some of your own questions. Probably the biggest thing I’ve learned here is how to be a leader of your own project, even as an intern. You can really take ownership of the project you have.”
Sajal Dash studies computational science at Virginia Tech where he is currently a fifth-year PhD student. There, he joined an HPC group and became interested in large-scale applications.
“Virginia Tech has the Advanced Research Institute and we have a few clusters,” Dash said. “Each of them has probably 100–200 nodes per cluster, and that’s where I kind of learned how to scale a problem with accelerators. That kind of motivated me to continue to scale out my problems using the most sophisticated and modern hardware So I thought ORNL would be a great opportunity to learn and get to do what I want to do.”
Dash’s research at the lab focuses on big data analytics, specifically through the lens of machine learning, and this summer, he is directing his attention toward the problem of catastrophic forgetting.
“When you incrementally train a model based on your data, it will perform on new data really well, but as time goes by, it starts performing really poorly on the old data. That’s because with incremental training the model keeps getting better on the new data, but it keeps forgetting about the old data. This phenomenon is called “catastrophic forgetting.”
Using clustering and random sampling, he is creating a training paradigm that can slow down the process of catastrophic forgetting in machine learning models.
Emily Costa is a rising junior at Florida International University where she is pursuing a bachelor’s degree in mathematics and a minor in computer science. After finishing her undergraduate degree, she hopes to pursue a PhD in computer science.
Costa’s current work at ORNL involves contributing to aspects of a scientific Python package to help build scalable frameworks for imaging and spectroscopy and developing better ways for researchers to interact with the code.
Although she isn’t majoring in computer science, Costa sees a lot of overlap between it and mathematics.
“Computer science problems are the same format as math: you have a problem, you have to think of multiple approaches, and then you go with one,” she said. “I think the ability to map out how you want to go about the problem, and the focus and the rigor that’s required to solve it, math has prepared me for.”
Yuya Kawakami is a math and computer science major from Grinnell College in Iowa where he will be a senior this fall. This summer, Kawakami has been working to accelerate the transfer of large amounts of data between CPUs and GPUs within nodes, and although this is his second internship with ORNL, he almost didn’t enter the computer science program at Grinnell.
“I go to a liberal arts school,” Kawakami said, “and computer science was never something I intended on doing. In fact, when I entered college, I was going to be a physics and Spanish major—which is nowhere close to math or computer science. I took a computer science class my first year of college, and it seemed to be a very interesting way to problem solve. I kind of saw the power of computer science and computing, and I enjoyed it, so here I am.”
At ORNL, Kawakami has seen the incredibly diverse applications of computer science research.
“It’s really important to learn what you like, but also what you don’t like,” he said. “I think one of the best things about computer science is that it’s not limiting in any sense. There are so many applications, so many things that people do with it, that the possibilities are endless.”
UT-Battelle LLC manages Oak Ridge National Laboratory for DOE’s Office of Science, the single largest supporter of basic research in the physical sciences in the United States. DOE’s Office of Science is working to address some of the most pressing challenges of our time. For more information, visit https://energy.gov/science.