People of ACM - Vivienne Sze

September 22, 2020

You have noted that while much of the processing for AI applications currently happens in datacenters and the cloud, there would be benefits to AI applications being processed on local devices. Will you explain this?

AI applications rely on computationally complex algorithms (e.g., deep neural networks) to achieve high accuracy and quality of result. While much of the compute for these applications currently happens in the cloud (i.e., datacenters), there are many compelling reasons to perform the processing on local devices (e.g., smartphones). First, if we want AI to be accessible to people around the world, we need to reduce its dependence on the communication infrastructure. Second, many exciting AI applications involve processing data for which privacy and security are important (e.g., medical data). For these applications, it is often preferred to bring the processing to the device where the data is collected and stored, as opposed to moving the data to the cloud. Finally, real-time interactive applications such as autonomous navigation and augmented reality/virtual reality (AR/VR) require low latency. For instance, when trying to avoid an obstacle, there may not be sufficient time to send the data to the cloud, wait for it to be processed, and wait for the results to be sent back. Instead, the latency can be reduced by performing the compute directly within the robot/vehicle or AR/VR headset itself. However, processing on local devices can be challenging, as the energy available is often limited by its battery capacity. Our research aims to address this challenge.

What prompted you to work on the development of the video coding standard H.265/HEVC? What is the relation between video coding and other applications such as machine learning or computer vision?

For my PhD research at MIT, I worked with Anantha Chandrakasan, who is an expert in energy-efficient circuit design. I was always interested in video processing, so I wanted to apply energy-efficient techniques to video compression—this was before the first iPhone, and the idea of being able to watch or record video on a device that could fit in your pocket was very exciting. While designing energy-efficient hardware for video coding, we soon realized that the algorithms were limiting our ability to reduce the energy consumption. For instance, one approach to reduce the energy consumption is to reduce the supply voltage and then use parallel processing to maintain the overall speed. However, video compression involves removing redundancy in the data to reduce its size, which inherently introduces dependency which then limits parallelism. In order to increase the impact of these energy-efficient techniques, we had to look into how to change the algorithms to make them more hardware friendly. At the same, we had to maintain the coding efficiency, since that was the main goal of video compression.

When I graduated from MIT, the development of the video standard High Efficiency Video Coding (HEVC) was just getting started—timing-wise, this was really fortunate since new video standards are typically developed once a decade. As a representative of Texas Instruments in the standards committee, I focused my efforts on making HEVC hardware friendly to improve its energy efficiency (to support the increased use of video on portable devices such as phones and tablets) and its speed (to support high resolutions and frame rates). This was a great opportunity to translate our research into practice.

While compressing pixels is necessary if a video is stored or transmitted, being able to understand pixels and extract meaningful information from a video opens up many new applications. Accordingly, when I returned to MIT as a faculty member, my research focused on developing energy-efficient techniques for computer vision, machine learning, and robot perception, with the goal of making understanding pixels as ubiquitous as compressing them.

Many computer scientists work exclusively in algorithms, while others work exclusively on hardware and architecture. Your work involves developing software and hardware systems in tandem. What’s an important insight you’ve learned about this wholistic approach of building various facets of a system together?

Our wholistic approach of jointly designing the algorithm and hardware together allows us to achieve significant energy and latency savings that neither hardware-only nor algorithm-only approaches can achieve. It also gives us a deeper understanding of the bottlenecks that limit our ability to achieve our goal, which allows us to better focus our efforts.

For instance, there has been a lot research on designing efficient algorithms for deep neural networks (DNNs). Many of these efforts focus on reducing the number of weights and/or the number of multiply-and-accumulate (MAC) operations in the DNN. However, by understanding how the DNN is processed by the hardware, it is clear that the number of weights and MACs are not a good proxy for hardware metrics such as energy consumption or latency. This is due to the fact that all weights and MACs are not created equal; instead, what matters are things like where the weight is stored and how it moves through the memory hierarchy, or the shape of the layers in the DNN which affect how the MAC operations are mapped onto the hardware. Factoring these hardware aspects into the design of the DNN algorithm, which we refer to as hardware-aware or hardware in the loop, can result in significant improvements in performance. Based on these insights, we have developed techniques such as energy-aware pruning and platform-aware network architecture search using NetAdapt that allow us to improve the tradeoff between energy/latency and accuracy.

Joint design across the different layers of abstraction within the hardware can also be beneficial. For instance, there has recently been a lot of research on “compute-in-memory” processors, where the compute elements are integrated into the memory storage elements to reduce the cost of data movement, which dominates energy consumption and limits processing speed. This area of research cuts across the device, circuit, and computer architecture research communities. In collaboration with Joel Emer, we have been developing tools, such as Accelergy, that allow us to bridge these different efforts so that device technologists and circuits designers can see the impact of their technology (e.g., memristors, optical computing) at an architecture or system level, while computer architects can factor in the benefits and limitations of the technology into their designs. Understanding the relationship between these different layers of abstraction is a critical step to enabling joint design.

What are one or two exciting applications of energy-efficient AI that might be implemented in the near future?

Energy-efficient AI is already used today for applications such as speech and face recognition on a phone. One of the key challenges of enabling energy-efficient AI is that it often requires some tradeoff in terms of quality of result. In order to get a better understanding of what is an acceptable tradeoff, we have been investigating the use of energy-efficient AI in the health care and robotics space. In collaboration with Thomas Heldt, we have been investigating the use of energy-efficient AI to measure eye movements that can be used for longitudinal in-home data collection to help assess and track the progression of neurodegenerative diseases (e.g., Alzheimer's and Parkinson's disease). In collaboration with Sertac Karaman, we have formed an interdisciplinary research group at MIT called Low-Energy Autonomy and Navigation (LEAN), which investigates the use of energy-efficient AI on a wide range of robotics tasks including visual-inertial navigation, motion planning, mutual-information-based exploration, depth estimation, and robot perception. The aim is to dig deeper into these applications spaces to better reach an acceptable tradeoff between energy, speed and accuracy.

Vivienne Sze is an Associate Professor at the Massachusetts Institute of Technology (MIT), where she directs the Energy-Efficient Multimedia Systems Group. Her research interests include designing and implementing computing systems that enable energy-efficient machine learning, computer vision, and video compression for a wide range of applications, including autonomous navigation, digital health, and the internet of things. At MIT, her group focuses on the joint design of algorithms, architectures, circuits, and systems to enable optimal tradeoffs between energy consumption, speed, and quality of result. Earlier in her career, Sze developed algorithms and hardware for the video coding standard H.265/HEVC. She is a co-editor of High Efficiency Video Coding (HEVC): Algorithms and Architectures. Most recently, she co-authored (with Yu-Hsin Chen, Tien-Ju Yang, and Joel S. Emer) Efficient Processing of Deep Neural Networks, which discusses approaches that enable the deployment of deep neural networks on a wide range of platforms.

Sze was recognized as the inaugural recipient of the ACM-W Rising Star Award, and has received numerous other awards, including the DARPA Young Faculty Award and faculty awards from Google, Facebook, and Qualcomm. As a member of the Joint Collaborative Team on Video Coding, she received the Primetime Engineering Emmy Award for the development of the HEVC video compression standard.