People of ACM - Jeffrey Heer

March 8, 2017

How did you come to be interested in the area of data visualization?

My interest began as an undergraduate computer science student at UC Berkeley. In addition to my engineering coursework, I was fascinated by psychology, and I minored in cognitive science. So I was already predisposed to projects at the intersection of perception, technology and design. During a human-computer interaction (HCI) course, a TA showed us the hyperbolic tree, a visualization technique developed at Xerox PARC. I was enthralled by the elegance of the technique and the experience of "whipping through" massive hierarchies. This piqued my interest in data visualization techniques, later leading to research on the topic with Stu Card, Jock Mackinlay, and others at PARC. These experiences bootstrapped my future research career. Funnily enough, some of our later findings actually challenged the usefulness of that inspiring hyperbolic tree technique—among other things, people don't read in circles, so visualizations that afford linear scanning can actually perform better!

Beyond the astronomical increase in the amount of data we can now collect and process, what changes in the field in the past five to 10 years are most responsible for shaping where data visualization is today?

Increasing volumes of data are of little value if they are not responsive to driving questions, or stored in forms that are not amenable to computational processing. This realization led Sean Kandel, Joe Hellerstein, and me to research visual interactive tools for "data wrangling," ultimately leading to the founding of Trifacta. The goal is to help analysts rapidly transform data so that it can be used by visualization, statistics and database tools, and also gain initial insights into data content and quality vital to gauging their relevance. While the increasing amount of data is indeed substantial and drives a number of systems-building issues, the accessibility and appropriateness of data should be fundamentally shaping the field.

(On this note, I believe the advent of rich browser-based tools for widespread publishing and dissemination of visualizations has had a more profound impact on the state of the field today than so-called "Big Data"!)

Going forward, the increasing use of statistical and machine learning methods provides both opportunities and pitfalls for interactive analysis tools. More powerful models can lead to more accurate predictions, and a judicious use of automation may help users more effectively explore data, develop hypotheses, and produce perceptually effective visualizations. On the other hand, if we set loose into the wild models that we don't sufficiently understand, or sacrifice appropriate human direction for "automagically" produced suggestions, we open ourselves up to great risks. An important and exciting research agenda lies in better understanding and designing analysis tools in the face of these tradeoffs.

“ReVision,” a recent system that you developed, uses machine learning to redesign existing data visualizations to enhance our understanding of data. How will advances in machine learning and artificial intelligence improve our data visualizations (and especially interactive data visualizations) in the years ahead?

The ReVision project (originally developed in collaboration with Manolis Savva, Nick Kong, Fei-Fei Li and Maneesh Agrawala, and now being advanced by UW post-doc Jorge Poco) uses computer vision techniques to try to automatically interpret bitmap images of a visualization. Computational chart interpretation is valuable for "unlocking" data that is only available from chart images, and enables automatic redesign to improve perceptual effectiveness or make charts more accessible. We're also excited to use these techniques to better understand how visualizations have been used, evolved and disseminated in the scientific literature.

ReVision focuses on the "inverse problem" of recovering data and visualization specifications from images. However, most data visualization research concerns the forward direction: how to best generate visual representations of data. Artificial intelligence and machine learning methods can be productively applied to help automate visualization design decisions. In fact, as early as 1986, Jock Mackinlay's doctoral thesis examined how to use logical AI methods to automatically generate more effective visualizations based on models of human visual decoding. Such approaches are ripe for a renaissance!

Of particular interest to my own group is not just how to aid the design of a single chart, but rather how to help guide data exploration processes. For example, our Voyager system (led by UW PhD student Kanit Wongsuphasawat) combines manual chart creation with a visualization recommender system. Too often I see students and experts alike "fixate" on a hypothesis of interest, circumventing a more comprehensive analysis that might reveal underlying data quality issues or unexpected latent factors. By automatically presenting visualizations that lie one step ahead in an analyst's search frontier, Voyager promotes broader consideration of a data set in an effort to combat "tunnel vision."

As I alluded to earlier, a key challenge is to design analysis tools that strike a productive balance between automation and user control. If our tools become overly automated, we risk reducing ourselves to passive receptacles of sometimes dubious recommendations, with human domain expertise and intuition dropping out of the loop. With an appropriate design, balanced systems should instead entwine with analyst-directed goals, leveraging automation to help counterbalance human biases and accelerate the rate of exploration.

Some have contended that simplicity should be a goal in data visualizations and that designs are becoming too complicated. From your years of research, what are some key insights you have gained about how the human brain processes visual information?

Simplicity is a relative, and often abused, term. Discussions of simplicity may also elide critical considerations of both audience and task. Einstein's (attributed) quote seems apt here: "Make things as simple as possible, but no simpler."

A central goal for visualization research is to design visual representations and attendant interaction techniques that become "invisible in use," such that people can think primarily in terms of the data and problems at hand, not the particular encodings and widgets by which they interpret and manipulate the display. Perhaps "clarity" is a better buzzword for this than "simplicity."

There is a substantial literature across psychology, design, statistics, cartography and human-computer interaction studying how to achieve these goals. Though pithy remarks hardly do justice to this work, the basic skills and concerns are not hard to learn. I highly recommend picking up a book (or course!) on visualization design; the benefits for understanding and communicating data should more than pay back the investment of effort. Suffice it to say that representation matters!

Jeffrey Heer is an Associate Professor of Computer Science and Engineering at the University of Washington, where he directs the Interactive Data Lab and conducts research on data visualization, human-computer interaction and social computing. The lab developed tools including D3.js, Vega, Protovis and Prefuse.

In 2012, Heer co-founded the company Trifacta. Trifacta’s software application combines visual interaction with intelligent inference for the process of data transformation. As part of ACM’s participation in SXSW 2017, Heer will deliver a presentation titled Interactive Data Analysis: Visualization and Beyond.