May 16, 2013: People of ACM: Jeff Dean
Thursday, May 16, 2013
Jeff Dean is a Google Fellow in the Systems Infrastructure Group. Before joining Google, he was at Digital Equipment Corporation's Western Research Laboratories. In 1990 and 1991, Dean worked for the World Health Organization's Global Programme on AIDS, developing software for statistical modeling, forecasting, and analysis of the HIV pandemic. A Fellow of ACM, he is a co–recipient (with Sanjay Ghemawat) of the 2012 ACM—Infosys Foundation Award in the Computing Sciences. He was elected to the National Academy of Engineering in 2009. A summa cum laude graduate of the University of Minnesota with a B.S. degree in Computer Science and Economics, he received MS and Ph.D. degrees in Computer Science from the University of Washington.
Research areas include large–scale distributed systems, performance monitoring, compression techniques, information retrieval, application of machine learning to search and other related problems, microprocessor architecture, compiler optimizations, and development of new products that organize existing information in new and interesting ways. Products Jeff has developed for Google include AdSense, MapReduce, BigTable, and Google Translate.
From your perspective as a pioneering software engineer, what do you see as the next big thing in Internet–scale computing?
There's a confluence of factors that I believe will enable very different kinds of applications. First, large–scale machine learning is making significant improvements for perceptual tasks, such as speech recognition, computer vision, and natural language understanding. Second, mobile devices are becoming more powerful and are more likely to have highly–available, high–bandwidth, low–latency connections to the internet. Taken together, this means that mobile devices can take advantage of large–scale processing capabilities in datacenters to understand more about the user's environment, and offer them assistance and additional information based on their current context at (hopefully) the precise moment that the user wants the information.
What impact did your research on scalable infrastructure have on the advent of cloud computing?
Much of our work on scalable infrastructure arose out of necessity in building and operating Google's crawling, indexing, and query serving systems. We were dealing with large datasets, and the most cost-effective way to get enough computational power was to buy lots and lots of relatively inexpensive, "commodity–class" computers, and to build software abstraction layers that make reliable systems out of these somewhat unreliable and modestly–sized individual machines. The systems that we have built solved significant problems we were encountering along the way. Several of my colleagues built a large-scale distributed file system (Google File System, or GFS), Sanjay Ghemawat and I developed an abstraction called MapReduce for doing reliable computations across thousands of machines, and later I and many other colleagues worked on higher-level storage systems such as BigTable and Spanner.
Our work has had some impact because other organizations, groups, and even other academic disciplines face many of the same problems that we faced when trying to store and process large datasets, and our approaches have turned out to be useful in these other environments.
Do you regard the open publication of your work as an opportunity for indirect technology transfer, and if so, how has it influenced the way other software engineers design and program large-scale IT systems?
Certainly. In some cases we can release open source versions of the software systems we've built, but sometimes we aren't able to easily open-source entire systems. Publishing papers that describe such systems in enough detail that others can use the general principles to create open source versions is one effective means of technology transfer. Publications can also sometimes get others to take similar strategies to tackle problems in other domains.
What advice would you give to budding technologists in the era of big data who are considering careers in computing?
Large datasets and computational challenges are cropping up in more and more domains. For example, the current modest–sized stream of genetic data is going to turn into an incredible flood because the costs of DNA sequencing are dropping exponentially. Computer systems and algorithms that can find interesting correlations in such data are going to really make important scientific discoveries about the genetic basis of diseases, and will make medicine much better targeted and personalized. That's just one example. Having a strong computer science background and being able to apply it to solve important computational problems is an incredibly exciting way for computer scientists to have tremendous impact on the world.