People of ACM - Bernhard Schölkopf

April 10, 2018

Why is our understanding of the foundations of empirical inference central to machine learning?

The fundamental problem of machine learning is how to generalize from past observations to future cases we haven’t seen yet. If past and future data are generated from the same probability model, we can guarantee that this works, under certain assumptions. These foundations thus tell us something structural about the possibility of learning and generalizations.

Your research group has worked in areas as diverse as astronomy and microbiology, investigating the underlying principles and methods of empirical inference.  Can you share an example from current research where an underlying principle or method was brought to light that applies in seemingly diverse domains?

We have recently developed a method termed ”half-sibling regression” that combines causal modelling and machine learning to reduce systematic errors, helping with the discovery of 14 previously unknown exoplanets in collaboration with astronomers. While working on this, we noticed that our approach generalized methods that were already used in bioinformatics. The basic idea in both cases is to explain and model systematic errors by exploiting statistical dependencies between observations that are causally affected by the same error source.

Recently, you have been working in the field of causal inference, an area that has not traditionally received as much attention from researchers. How might advances in our understanding of causal inference impact the development of intelligent systems such as autonomous vehicles and robots?

In the field of probability theory, a data-generating process remains constant if its corresponding set of variables are independent and identically distributed (IID). While standard machine learning is concerned with generalizing within a setting where the data-generating process does not change (loosely speaking, the “IID” setting), causal modeling tries to build a richer model of the world, including notions of intervention and distribution shift. Basically, each causal model encodes and implies a whole class of probabilistic models that can describe a range of related problems, and that let an agent reason about the effect of actions. We cannot build IID models for each task an agent is ever going to encounter, so we must understand how to simultaneously learn a whole class of models—causal models may be the next step in this endeavor, but probably not the last one.

Most of the successes that we currently see in machine learning applications are due to us making sure that for a given task, things are stationary, the data being independent and identically distributed (IID). But there’s a whole world out there where that assumption is violated, and that's the world where animate intelligence excels. The recent observations surrounding “adversarial attacks” point exactly at this lack of robustness which comes from IID specialization—the next level of generalization thus has to be not within a single IID setting, but between different but related problems.

How do you feel about the impressive increase of popularity of machine learning methods during the recent progress in the field of artificial intelligence?

I am genuinely excited about this and I am grateful to be living in a time where ideas of science fiction like instantaneous translation and global communication are turning into reality.

Some people call it the digital revolution, some the AI revolution. What’s more interesting than the name is what characterizes it, and it’s instructive to compare it to the first two industrial revolutions. Both of those were about the industrialization of energy: we first learned how to generate energy at scale using steam and water power, and then how to convert it into electrical energy and de-localize it. I think we are now seeing a similar revolution with information instead of energy, which is exactly the paradigm shift started by people like Wiener and Shannon when they founded the field of cybernetics. While information can be processed by humans, to do it at scale we needed computers, and to do it intelligently, we had to come up with machine learning. By the way, just like energy strictly speaking cannot be generated (since it’s a conserved quantity), the same is probably true on a fundamental level for information, so we can only extract and convert it.

Bernhard Schölkopf is Director of the Department of Empirical Inference at the Max Planck Institute for Intelligent Systems in Tübingen, Germany, and is also Chief Machine Learning Scientist at Amazon Retail. His research interests include machine learning and inference from empirical data. Schölkopf has been recognized for key contributions to theory and algorithms for kernel machines used to classify data and estimate relationships among variables, as well as causal inference. He has co-authored Elements of Causal Inference: Foundations and Learning Algorithms with Jonas Peters and Dominik Janzing.

Schölkopf is a recipient of The Royal Society Milner Award (2014), which is given for outstanding achievement in computer science by a European researcher, and the Gottfried Wilhelm Leibniz Prize (2018), the most important research award in Germany. In 2017, he was named an ACM Fellow for contributions to the theory and practice of machine learning.