Multi-university team works on edge-cloud to help scale big data
According to Cisco Systems, Internet video currently accounts for 66 percent of global Internet traffic, and that number is projected to rise to 79 percent by 2018. In North America, these numbers may be even higher, with YouTube and Netflix alone accounting for more than 50 percent of all Internet traffic. For CSL Professor Pramod Viswanath, those statistics mean there’s an enormous amount of data that needs to be stored and transported to make the growing amount of video streaming possible.
“This is an extremely important problem right now that many researchers are interested in,” Viswanath said. “We can continue adding data to the cloud, but it gets increasingly harder to scale up. If we continue at this pace, I’m not sure how much more data we could keep storing or how efficiently we could do it.”
Viswanath is teaming up with CSL professors Bruce Hajek and R. Srikant, along with ECE professors Muriel Médard of MIT, Kannan Ramchandran of the University of California, Berkley and Alex Dimakis of the University of Texas at Austin, to develop algorithms and architecture for how to move data in today’s Internet. This research is important because as video continues to consume more bandwidth, the current method for managing and delivering the data is extremely expensive in both capital investment and environmental impact.
The team was recently awarded a three-year, $1.2 million NSF grant ($600,000 for Illinois) for their project, “Content Delivery over Heterogeneous Networks: Fundamental Limits and Distributed Algorithms.” The project will focus on developing the fundamental research that can be applied in the future in a variety of ways to create a more distributed network. Médard and Ramchandran are both former Illinois electrical and computer engineering faculty members and CSL researchers, who have previously collaborated on projects with Viswanath and the rest of the group.
“Internet used to be have point-to-point connections,” Viswanath said. “Whether there’s a centralized server or not, users don’t care. But someone still has to store all those videos, have it on a hard drive or have the bandwidth capability to store it.”
Drawing from the expertise of the team members, with researchers at University of Texas at Austin focusing on storage aspects, MIT researchers connecting with media companies and Hajek being an expert in peer-to-peer networks, the group will work together to develop algorithms that will address storage and communication issues.
“You need to store the data somewhere and you need to move it around,” Viswanath said. “Both affect each other, so it’s a holistic project.”
They will be developing a comprehensive theory for the design and analysis of content distribution networks (CDN). CDNs are overlay networks over the Internet and constitute a key part of how different types of content are stored and delivered. The traditional CDN approach relies on massive investments in centralized infrastructure, especially for photo sharing, video and social networking sites that manage and deliver large amount of data. In fact, in 2008, the U.S. Environmental Protection Agency reported that data centers uses more than 1.5 percent of the total U.S. electricity consumption.
Over the next three years, the team will be addressing this challenge in two different ways. They will begin with a centralized architecture and work to make it more distributed while still maintaining the quality of service and then they will begin to develop a distributed server-free architecture that offers quality service. The end result will be design guidelines, distributed storage and deliver algorithms and fundamental performance limits for the next generation of distributed CDNs.
“All of us have devices, such as computers or phones, that have connectivity and storage,” Viswanath said. “If the data is spread around between all of us, we could capitalize on that storage capacity and make the data more scalable, but also more complicated. It’s very challenging, but we want to create algorithms that make it possible for all that data to be distributed in as seamless of a way as possible in the future.”