Current Deep Learning architectures grow to learn from complex datasets

Abstract

Current Deep Learning (DL) architectures are growing larger to learn from complex datasets. Training and tuning astronomical sized models are time and energy consuming and stalls the progress in AI. Industries are increasingly investing in specialized hardware and deep learning accelerators like TPUs and GPUs to scale up the process. It is taken for granted that commodity hardware CPU is incapable of outperforming powerful accelerators such as GPUs in a head-to-head comparison of training large DL models. However, GPUs come with additional concerns: expensive infrastructural change which only few can afford, hard to virtualize, main memory limitations. Furthermore, the energy consumption of current AI training is prohibitively expensive. An article from MIT Technology Review noted that training one Deep Learning models are generating more carbon footprint than five cars in their lifetime.

In this talk, I will demonstrate the first algorithmic progress that challenges the common knowledge prevailing in the community that specialized processors like GPUs are significantly superior to CPUs for training large neural networks. The algorithm is a novel alternative to traditional matrix-multiplication-based backpropagation. We will show how data structures, particularly hash tables, can reduce the no of multiplications associated with the forward pass of the neural networks. The resulting algorithm is orders of magnitude cheaper and hence energy efficient. The very sparse nature of updates uniquely allows for an asynchronous data-parallel gradient descent algorithm. A C++ implementation with multi-core parallelism and workload optimization on CPU is anywhere from 4-15x faster than the most optimized implementations of Tensorflow on the best availableV100 GPUs in a head-to-head comparison. The associated task is training a 100-million-parameter neural network on Kaggle Amazon recommendation datasets. If time permits, I will also illustrate some of our developed softwares in this direction.

Biography

Anshumali Shrivastava is an associate professor in the computer science department at Rice University. He is also the Founder and CEO of ThirdAI Corp, a company that is democratizing AI to commodity hardware through software innovations. His broad research interests include randomized algorithms for large-scale machine learning. In 2018, Science news named him one of the Top-10 scientists under 40 to watch. He is a recipient of the National Science Foundation CAREER Award, a Young Investigator Award from the Air Force Office of Scientific Research, a machine learning research award from Amazon, and a Data Science Research Award from Adobe. He has won numerous paper awards, including Best Paper Award at NIPS 2014 and Most Reproducible Paper Award at SIGMOD 2019.