People of ACM - Heng Tao Shen

March 24, 2020

How did you initially become interested in multimedia search?

Text-based Google search became popularly used in 1999. Since the web hosted all types of media data, that is, multimedia data, I was passionate about working on multimedia retrieval for my Honor year’s thesis project. A regular paper on Web image retrieval, together with its demonstration system, appeared in ACM Multimedia 2000. Nowadays most, if not all, applications/systems deal with multimedia data instead of single-modality data.

In the recent paper “Time-aware Session Embedding for Click-Through-Rate Prediction," which won the ACM Multimedia 2019 Grand Challenge, you and your co-authors explored how recommender systems based on collaborative filtering might be improved. What is collaborative filtering and how might someone using a streaming service to watch television programs benefit from the insights presented in your paper?

Collaborative filtering (CF) is a classic algorithm used by recommender systems. It makes predictions for one user by utilizing many other users’ interests. There are several variant models of CF. The simplest one is item-based CF. "Item" here means a movie or something that you want to recommend to users. The key idea of item-based CF is to calculate the similarities among all movies. It recommends the most similar movie to one user based on what he or she previously watched. CF typically measures the similarity between two movies by calculating how many users co-watched those two movies.

One big difference between our method and CF is that our method can handle a serious problem of CF, which is called "cold start." If one movie has never been watched before, then CF cannot calculate the real similarity between it and all other movies. In our method, we enable our Time-aware Session Embedding (TSE) model to characterize the phenomenon of time-aware relevance decay, and model the similarity not only based on users’ interests, but also the content itself. Therefore, if there is some new content that has never been watched before and you want to recommend it to the right users, then our method can find those who like this content and improve the clickthrough rate.

You were recently awarded a significant grant from the Ministry of Technology and Science of China for the project “Active Monitoring, Cognition, and Searching for Disaster Environments.” Will you briefly tell us about the goals of this project?

The goal of this project is to improve the emergency response to major disasters by developing and applying new artificial intelligence technologies, including actively detecting vital sign, fusing visual and auditory information for decision making, and so on, in post-disaster environments.

What is an exciting avenue of research in your area that you think will have a significant impact in the coming years?

In the AI era, applications are becoming more interactive and intelligent. Human beings can perfectly sense the environment with five senses, and understand it by fusing and analyzing the sensed data through the brain. To computers, different data sensed from different devices are represented in different formats. One exciting avenue in the multimedia field is the ability of a computer to recognize and describe what it “sees” in visual data—such as images and video—at the level a human would do, and vice versa. It will integrate machine-level visual content understanding, natural language processing, and speech recognition in a natural way to aid in understanding and interacting with the visual data, the textual data, and the auditory data by means of a computer.

Heng Tao Shen is Professor, Dean of School of Computer Science and Engineering, Executive Dean of the AI Research Institute, and Director of the Centre for Future Media at the University of Electronic Science and Technology of China. An ACM Distinguished Member, Shen has been involved in the ACM Multimedia Conference, and is an Associate Editor of ACM/IMS Transactions on Data Science (TDS).