People of ACM - Hang Li

January 24, 2023

Having previously served as Director of ByteDance’s AI Lab, you were recently named Head of Research. What does your new role involve? In what areas will ByteDance be expanding its research activities?

I am leading teams working on natural language processing and several other areas in AI, including robotics, AI for Science, and Responsible AI.

For example, our robotics team recently won the championship in the Habitat ObjecNav Challenge 2022. In the challenge, the robot in computer simulation is put into an unseen environment and asked to find an object by navigating the environment. Our AI for Science team, in collaboration with a team at Peking University, has developed a new method for solving Schrodinger’s equation of real solid systems using deep learning techniques. The work has been published in Nature Communications recently.

Will you give a tangible example of how innovations in using machine learning for search and dialogue have improved the user experience with Tik Tok or another ByteDance App?

Let me give two examples of machine learning technologies used in ByteDance's products for search and dialogue.

Toutiao is a news recommendation app with hundreds of millions of users in China. It has a search engine for searching news and the internet in China. The unbiased learning-to-rank algorithm we developed was used in the search engine. Given a query, the algorithm can automatically rank documents based on relevance, importance, and quality of documents. The algorithm is trained with click data in the search engine. One advantage of this approach is that it can leverage users' clicks as feedback, assuming that a clicked document is more relevant than an unclicked document. One challenge of using click data is that it is intrinsically biased, with position bias as most noticeable. That is, documents ranked higher tend to be clicked more by users independent of relevance. Our algorithm can automatically eliminate position bias when training the model from click data.

The Dali Smart Lamp is a product ByteDance released in China in 2020. It is a learning assistance tool for students at primary schools. The lamp on the desk with cameras, screen, speakers, and microphones can talk to the student, answer questions from the student, help the student to read, and conduct auto-grading of homework. For example, when the student reads a book and finds an unknown word (either in English or in Chinese), she can finger-point the word and ask, "Dali Dali, what does it mean?" Dali can explain the meaning of the word. The goal of the product is to help the students learn and grow better. The dialogue system of the lamp is developed by the NLP team using machine learning. The system has natural language understanding, question answering, and dialogue management models trained from data.

In the article "Language Models: Past, Present and Future” for the July 2022 issue of Communications of the ACM, you argued that to make models closer to human language processing, we should seek inspiration from the human brain. Will you explain what you meant by this?

The neuroscientist David Marr defined three levels of computation: function level, algorithm level, and implementation level. I mainly consider inspirations from human brains at the function level.

Human language understanding is, by nature, multimodal processing involving not only language but also vision, etc. If we can train a model from multimodal data such as image-text pair data, then the model can be used in not only language processing but also computer vision, etc. Such a model should be even more powerful than existing models. My article was written in 2021 and published in 2022. In the past two years, there has been significant progress in pre-trained multimodal models, also called foundation models. There will be more progress in this direction in the near future, I believe.

The language processing of the human brain is divided into Broca’s area and Wernicke’s area; the former is responsible for grammar, and the latter is responsible for lexicon. If we can devise a new Transformer model closer to that of humans, then we will be able to further enhance the power of language modeling. It must be very challenging, but it is also worth exploring. The new Transformer should be able to solve the problem of controllable generation. Its Wernicke module is responsible for the generation of semantics, while its Broca module is responsible for the generation of language expressions themselves. Another advantage is that it can improve the sample efficiency and computational efficiency of learning. Language is a combination of linguistic patterns and vocabulary. If you learn them separately, then you can make learning more efficient.

The recent big success of ChatGPT again demonstrates the great potential of language modeling for AI. However, the limitations of this approach are also evident, as explained in my article.

What has been a longstanding challenge in machine learning that we will make progress on in the near future?

I am not able to make a prediction. However, I can express my hope for machine learning. I believe that deep learning will still play a vital role in the foreseeable future as it currently does. We would appreciate progress in research on the following issues.

One challenge for deep learning is that it still does not have a solid mathematical theory as its foundation. It is necessary to have a mathematical theory to precisely explain the workings of deep learning and effectively guide its development.

Another challenge for deep learning is that it needs to be more explainable, more robust and reliable, and fairer in how it makes decisions. When deep learning is increasingly employed in practice, AI supported by trustworthy deep learning becomes more important.

Deep learning, sometimes combined with reinforcement learning, also needs to be more effective and efficient in solving AI problems. One important goal is to get more inspiration from human brains, which are the most intelligent computing systems. Therefore, human brain-inspired deep learning/AI should be an important direction to explore. The language modeling idea described above is an example.

Hang Li is the Head of Research at ByteDance, a multinational internet technology company headquartered in Beijing. Among its holdings, ByteDance owns TikTok, a short-form video hosting service, and Douyin, a Chinese counterpart. Li has published five books and more than 150 technical papers in areas including information retrieval, natural language processing, machine learning, and data mining.

Li has been active in service to the field, having been on the program committees for many leading conferences including ACM SIGIR and ACM WSDM, as well as serving on the editorial boards for international journals. Li was recently named an ACM Fellow for contributions to machine learning for search and dialogue.