Illinois team claims victory at NVIDIA AI City Challenge


Christina Como, ECE ILLINOIS

Researchers recently placed first by a large margin in the premier IEEE Smart World NVIDIA AI City Challenge. For two months, they competed against academic labs from around the globe to make strides in the field of computer vision and engineer the most effective model for object detection in traffic video. They beat research teams from Brazil, China, Greece, India, Italy, Japan, and Turkey, as well as American teams such as UC Berkeley, UW, SUNY, SJSU, and ISU.

The Illinois team consisted of five members of the IFP group at the Beckman Institute, including CSL Professor Thomas S Huang, ECE PhD candidate Honghui Shi, Zhichao Liu, graduate student Yuchen Fan, and postdoc Xinchao Wang. The team’s work was in part supported by IBM-ILLINOIS C3SR, directed by Wen-Mei W Hwu at CSL and Dr. Jinjun Xiong from IBM Research.

The Jetson TX2 module NVIDIA provided to all teams.
The Jetson TX2 module NVIDIA provided to all teams.
The 150 leading researchers and academics were challenged to advance city intelligence, safety, and transportation by applying deep learning frameworks to a large dataset of traffic camera video. With millions of roadway cameras installed for traffic and pedestrian safety, traffic video cameras are among the world’s largest generators of data. Incorporating AI technology with video will enable real-time analysis of multi-object detection and classification and behavior analysis, with a broad array of applications including public safety, traffic, parking management, law enforcement, and city services.

“AI is still, from my point of view as a computer vision researcher and engineer, in its very early stage,” Shi said. “Computer vision is a subarea they are still trying to understand … and object detection is a fundamental challenge and difficult question in computer vision. We are trying to make object detection accurate so that someday vision applications such as autonomous driving will be safe for everybody, at least the 99.99%.”

The NVIDIA AI City Challenge aimed to overcome issues with traffic video datasets such as poor data quality, a deficiency of rich labels for data, and a lack of high quality models that make sense of the data. NVIDIA’s challenge was, in some sense, an expansion from the revolutionary image recognition challenge ImageNet, which has already peaked in performance beyond human perception.

“ImageNet was more fundamental,” said Shi, a competitor in both challenges. “It was doing basic detection and classification of much more general objects. The NVIDIA AI City Challenge is more specific - it’s for object detection from traffic systems, including different type[s] of cars … They have to differentiate between small cars, SUVs, vans, small trucks, motorcycles, etc. ... Also they have to separate from pedestrians, the different colors of traffic lights, and different times of day. It’s much more fine-grained. This is a brand-new dataset, and one of the largest available today, actually.” 

Competitors rivaled to detect objects including vehicles of all sizes, pedestrians, and traffic lights in traffic camera footage.
Competitors rivaled to detect objects including vehicles of all sizes, pedestrians, and traffic lights in traffic camera footage.
The competition was divided into two phases. In the first month of the challenge, all teams helped to annotate over 1.4 million objects from key frames of more than 80 hours of video from intersections near Silicon Valley, Nebraska, and Virginia. For the second phase, teams rivaled to construct and deploy the best model to decipher and track objects from the video data the most effectively and efficiently. All teams were provided access to NVIDIA DGX AI supercomputing systems for model training and Jetson TX2 modules for deployment.

"The challenge is about your understanding of deep learning algorithms and implementation skill,” Shi said. “Three weeks was a limited time to produce a final product … There are millions of parameters that the model has to update. To train one model takes a couple of days. You have to make many decisions with limited feedback.”

On August 5, Illinois was awarded first place and a prize of NVIDIA TITAN Xp GPUs at the Silicon Valley workshop. After the researchers disclosed their methods at the workshop, the competition was extended for another two-week trial called “Camera Ready.” Illinois also updated a model and further improved upon their winning entries during this additional trial.

“A challenge provides a way for people of talents to come together,” Shi said. “A lot of cutting-edge research [is] invented [because] you can have better performance and different kinds of insights … This dataset is real-world data, so it has potential benefits of applying to real things like autonomous driving or traffic surveillance. These are very important tasks.”

Shi has been invited to present Illinois' work from the challenge at the November NVIDIA GPU Tech Conference in Washington D.C.

For more, read the highlight on the NVIDIA blog.