CSL researchers win SASP’09 best paper award for work on translating CUDA for FPGAs
CSL researchers Deming Chen and Wen-mei Hwu have received the best paper award at IEEE’s Symposium on Application Specific Processors 2009. The research focuses on applying the CUDA programming language to field-programming gate arrays (FPGAs), opening up a new research area where compiler and synthesis techniques intersect.
The paper, “FCUDA: Enabling Efficient Compilation of CUDA Kernels onto FPGAs," outlines the researchers’ novel method for deconstructing CUDA to work on FPGAs, a chip that is highly adaptable and power efficient. CUDA is a programming language written for NVIDIA’s graphic processing unit (GPU). The adapted language, FCUDA, will make it easier for CUDA programmers to write parallel code for FPGAs.
“People like FPGAs because they can be easily molded into what you want them to be,” said Hwu, a professor of electrical and computer engineering at the University of Illinois at Urbana-Champaign. “But the same characteristics that make them desirable also make it difficult to program using the CUDA language.”
The challenge in translating CUDA is that it was originally designed to run on GPUs, which can execute thousands of fine-grained threads (or computational processes) at the same time. FPGAs, however, are not set up to support such a large number of threads running concurrently.
In order to make CUDA compatible with FPGA, Hwu and Chen are working with Jason Cong of the University of California at Los Angeles to build sophisticated tools to bridge the gap.
The Illinois team is working on a frontend compilation tool that will translate CUDA into parallel C code, a traditional language that runs on many platforms. During this translation, the original thousands of threads in CUDA are packed into a smaller set of longer and “heavy-duty” threads in C. The C code will then be synthesized using AutoPilot, a high level synthesis tool driven by the UCLA team that will enable high-abstraction FPGA programming. The key features of this CUDA-to-FPGA flow include: CUDA code pragma annotation for generating various parallel task functions, efficient computation and data communication synchronization, burst-mode DMA transportation, double buffering for increased memory bandwidth, and CUDA to FPGA memory mapping.
The result will be hardware that offers more computational strength with lower power consumption, as well as the ability to support high-performance computing on smaller equipment, such as portable medical imaging scanners.
“Right now, converting CUDA for FPGA is very challenging work and is mostly done by hand,” said Chen, an assistant professor of electrical and computer engineering. “With this work, it is possible to automate the process, which will hopefully stimulate a lot of research activity.”
In fact, research groups are already starting to show interest in using the tool. Colleagues at a nearby university would like to use the tool to develop an application that would help physicians better determine radiation dosage for cancer patients.
The selection of CUDA as the programming interface offers unique advantages. First, CUDA is ideal to express parallelism in a concise fashion. Second, this FCUDA flow provides a common programming model for computing clusters with nodes that include both GPUs and FPGAs. Finally, the wide adoption of CUDA renders a large body of existing applications available to FPGA acceleration.
“We anticipate this greatly improving productivity,” Chen said. “With this flow, we are able to quickly evaluate whether an application should be accelerated on GPUs or FPGAs.”
The student coauthors are Alex Papakonstantinou and John Stratton of University of Illinois and Karthik Gururaj of University of California at Los Angeles. The project is supported by MARCO/DARPA Gigascale Systems Research Center and the National Science Foundation.