7/25/2017 Mike Koon, College of Engineering
Written by Mike Koon, College of Engineering
For the second time in three years, a team from the University of Illinois has placed high in the global ImageNet Large Scale Visual Recognition Challenge (ILSVRC 2017).
Honghui Shi, a PhD student in electrical and computer engineering with affiliation at Beckman Institute and the Coordinated Science Lab, led a team that placed second in all four categories of object detection and tracking from video. Shi works in the lab of Thomas Huang, a professor emeritus in ECE and a founding figure in computer vision and image processing. He was also leading a team that placed third in the task of object detection from videos in 2015.ImageNet is a world-famous computer vision project designed by Stanford professor Fei-Fei Li et al. to provide researchers around the world a large-scale hierarchical database that is much larger in scale and diversity and much more accurate than previous image datasets. It began the ILSVRC in 2010 and over the years has had a lot of interest from industry as well as researchers. It is most well known since 2012, when for the first time deep learning took the world of artificial intelligence and machine learning by storm due to its huge success in the ILSVRC 2012 to more accurately classify images.
“Since that time, other industry players, including Google and Microsoft, have entered the competition, making it much tougher for academic teams to compete. They have both the talents and the advantage of often using thousands of computers at a time.”
This year, Shi’s team includes collaborators Xinchao Wang and Yuchen Fan from the University’s Image Formation and Processing (IFP) Group as well as members from the National University of Singapore led by Yunchao Wei.
As deep learning methods have advanced, so have the accuracy of the results. For instance, in the category of object detection from videos with provided data, the Illinois team achieved an accuracy rating of about 48 percent in 2015. This year, the team reached 76 percent, second only to the team from Imperial College London. Illinois joined London as the only individual category winners, claiming first in five categories, and outdistanced teams from Tsinghua, Chinese Academy of Sciences, Cal Berkeley, and Michigan.
In addition, when using the deep learning models to test on the validation set of 555 video images, Shi’s team achieved an 84.5 percent result, which is so far the best result ever reported on the ImageNet video dataset in industry.
“As the neural networks have evolved, the architecture has changed,” Shi said. “In 2015, we used a shallower neural network of 16 layers. This year we used advanced networks with about 100 layers. The recognition capacity and accuracy of the networks have both evolved significantly.”
While the ImageNet project focused on still images in the beginning, there are continually more demands for detection in video, such as the emergence of autonomous driving. In addition to detection, Shi’s group is also focused on tracking, meaning the ability of the network to identify a specific object it has seen before. Shi’s team again placed second to only the team from London in the object detection/tracking category.
ILSVRC 2017 will mark the last of the ImageNet Challenge competitions. As the performance of state-of-the-art algorithms are surpassing human perceptions, organizers are focusing on unanswered questions and directions for the future of computer vision and artificial intelligence, for instance, how to advance research of allowing computers see to making them able to reason as well.
“ImageNet set the start of the revolution of deep learning in 2012 and has been a benchmark dataset for the whole artificial intelligence revolution,” Shi noted. “I have enjoyed pushing the boundaries of computer vision and deep learning models on perceptive vision tasks with other researchers, and I look forward to joining the next wave of revolution beyond ImageNet.”
The Illinois team will officially be recognized at the 2017 Conference on Computer Vision and Pattern Recognition on July 26 in Honolulu.