People of ACM - Qi He

February 18, 2021

How did you initially become interested in the intersection of data mining, information extraction and AI?

Every child has a dream of owning a human-like intelligence system, like Terminator, E.T., or Doraemon. However, you soon become frustrated when you realize you have to clean the data for months, only to feed a good feature into your machine learning model so that it is incrementally closer to an intelligent human. You feel worse if you realize you don’t even have sufficient data at hand. I joined LinkedIn eight years ago to pursue big data. Big data (the useful knowledge part extracted from big data, not the noise) makes AI very powerful by significantly reducing the variance of prediction by providing more training examples, as well as the bias of prediction with more useful features. In turn, AI gives us new opportunities to extract more knowledge from big data more accurately. Big data, information extraction and AI create a virtuous cycle, which fascinates me deeply.

What is the LinkedIn Knowledge Graph and what is a challenge you and your colleagues are working on now?

LinkedIn’s Knowledge Graph is a large knowledge base built upon “entities” on LinkedIn, such as members, jobs, titles, skills, companies, geographical locations, schools, etc. These entities and the relationships among them form the ontology of the professional world and are used by LinkedIn to enhance its recommender systems, search, monetization and consumer products, and business and consumer analytics. It’s important that LinkedIn’s Knowledge Graph scales as new members register, new jobs are posted, and new companies/skills/titles emerge. A dynamic graph reflects the macro/micro trends of the economy.

We primarily derive LinkedIn’s Knowledge Graph from a large amount of content from members, recruiters, advertisers and company administrators. The data is sparse, noisy, dynamic and hard to measure per user. We accurately recognize entities by using deep neural networks with humans in the loop, and link entities by representing them in a way that aligns with their semantics and graph structures via graph neural networks. We also leverage across-domain and across-language transfer learning to mitigate the data sparsity. We’re continuously refining the knowledge graph in a timely manner via the user feedback loop. We also pay careful attention in ensuring the knowledge graph represents the diversity of user communities, and that the data coverage and accuracy do not unfairly skew towards one gender or ethnicity.

Will you tell us about a key insight you and your co-authors presented in the paper “TwitterRank: finding topic-sensitive influential twitterers,” for which you received the WSDM 10-Year Test of Time Award?

Thanks to Jianshu and other co-authors’ years of diligent studies in Singaporean Twitter users, this paper proved that the “following” relationship in Twitter is not random and indicates a strong content-based “influence” signal from followees to followers. The paper identified the opinion leaders in the social network and then leveraged them to influence the target audience, paving the way for Twitter-based audience targeting in internet marketing.

What recent computing innovations will significantly shape the LinkedIn platform in the coming years?

LinkedIn connects the world’s professionals to make them more productive and successful, whether that entails finding a job or expanding their network. In today’s job marketplace, skills are the currency. Job seekers want to know what skills they need to land them their dream jobs, and companies need to hire people with the right skills to succeed in the role. Thus it’s imperative to identify what skills are needed for the future, and help companies quickly, efficiently and accurately hire for and develop those skills. As we see expanded distribution of the COVID vaccine, we predict a new world of work, in which there’s a stronger need for new skills and new hiring approaches.

To support this trend, I’m most excited about our work innovating a new LinkedIn skill ontology. It is rich in semantics (e.g., skills are connected to relevant signals like titles, credentials, products and services, fields of study and skills themselves) and updated frequently. It objectively and accurately measures the member’s skill proficiency via AI models on top of LinkedIn learning and skill assessment products. Such a comprehensive skill ontology can’t be purely curated by humans, although humans still matter plenty when ensuring that the language defined by this ontology is highly accurate. We’re innovating many deep NLP models to automatically create skills and their relationships with other relevant signals, and multitask learning models to capture the context-based skill proficiency under different LinkedIn product contexts.

Qi He is Senior Director of Engineering at LinkedIn, where he leads a team of more than 150 machine learning specialists, software engineers and linguistic specialists to standardize LinkedIn data and build the LinkedIn Knowledge Graph. He has authored 70 publications (with 6,000 citations to date) in areas including information extraction, data mining, artificial intelligence and natural language processing (NLP). His honors include receiving the 2008 SIGKDD Best Application Paper Award, as well as the ACM International Conference on Web Search and Data Mining (WSDM) 10-Year Test of Time Award in 2020.

Currently, He is on the Board of Directors of the ACM International Conference on Information and Knowledge Management (CIKM), having previously served as a General Chair and Program Chair of the conference. He has also regularly served on the Program Committees of ACM conferences such as SIGKDD, SIGIR, WWW and WSDM. He was recently named an ACM Distinguished Member.