CSL students’ paper selected for international reproducibility competition
Each year, computing teams from around the world gather to try to replicate the results of a carefully selected research paper as part of the Student Cluster Competition that happens at the annual Supercomputing Conference. This year, the selected paper was written by CSL students Mert Hidayetoğlu and Simon Garcia de Gonzalo.
“Being selected for this competition means this piece of code is way above the standard quality and that the conference has a great deal of confidence in the stability and documentation we produced in the code,” said Wen-mei Hwu, CSL professor and the students’ adviser. “This is a very good confirmation about the care and integrity of the research.”
Usually when research is published, a paper includes the techniques and algorithms used so the research can be reproduced. However, in computing, oftentimes simulations and algorithms have “fragile” code that can’t be replicated. Sometimes, when students graduate, the code infrastructure they've built leaves campus with them. If nobody can reproduce it, then other students can't build on that research. This is clearly an issue in the field, which means that writing a paper and code that can be reproduced is a big deal.
Participants in the Student Cluster Competition – which is planned to take place in November -- will be judged on their ability to reproduce the results and conclusions in the CSL students’ original paper, “MemXCT: Memory-Centric X-ray CT Reconstruction with Massive Parallelization.” Each group will build their own system to replicate the results involving different hardware and software.
“The reproducibility effort is important in terms of scientific quality. When you claim something scientifically, you need to evaluate that claim in a systematic way, find its limits, and also make it reproducible so other people can verify your work,” said Hidayetoğlu, lead author on the paper and electrical and computer engineering (ECE) student. “It’s nice to contribute to the community’s reproducibility effort and to make an impact.”
The authors not only received the recognition of having their paper chosen from among 45 finalists, but also will have 16 teams work to duplicate their research, offering them strong international visibility.
“Usually, if you develop a piece of software, it may be used by your lab, or maybe even externally by a few others,” said Garcia de Gonzalo, a computer science graduate student. “Now our work is essentially being used by people from around the world. It brings a lot of value because now a lot of people will know about the code, they will have hands-on experience using it, and they will take their experience back to their home universities.”
While the paper, originally published at last year’s conference, is making an impact at this year’s conference (as well as in the future careers of the CSL students), the findings in the paper are important in their own right. The research involved creating an X-ray imaging algorithm that can be scaled for supercomputers.
“The paper itself has scientific merit,” said Garcia de Gonzalo. “There are groundbreaking results in terms of the biggest image reconstruction to the date of publishing.”
Not only was the algorithm able to reconstruct the largest image of its kind to date, it also did it in near-real time. This was accomplished using large scale X-ray datasets collected at project partner Argonne National Laboratory’s synchrotron light source facility, one of only a few such facilities in the U.S. The group applied their scalable algorithm to reconstruct a mouse brain sample that was bigger than 5.1 terabytes in less than 10 seconds. This lays the foundation for more quickly reconstructing large images, which will help speed up medical-related diagnostic results and scientific research. The team used multiple supercomputers of various architectures including the National Center for Supercomputing Application’s Blue Waters and Argonne's Theta.
“The production code will be included in the national lab’s inventory and people will be able to use it for scientific experiments,” said Hidayetoğlu. “We are open-sourcing the code to give our service to the community.”
Hidayetoğlu and Garcia de Gonzalo will be on-hand for the challenge, scheduled to take place November 16-18 in Atlanta, GA.
Hwu is also the AMD Jerry Sanders Chair of Electrical and Computer Engineering and affiliated with the Information Trust Institute.