

This paper presents a novel image editing program emphasizing easy selection and manipulation of material found in informal, casual documents such as sketches, handwritten notes, whiteboard images, screen snapshots, and scanned documents. The program, called ScanScribe, offers four significant advances. First, it presents a new, intuitive model for maintaining image objects and groups, along with underlying logic for updating these in the course of an editing session. Second, ScanScribe takes advantage of newly developed image processing algorithms to separate foreground markings from a white or light background, and thus can automatically render the background transparent so that image material can be rearranged without occlusion by background pixels. Third, ScanScribe introduces new interface techniques for selecting image objects with a pointing device without resorting to a palette of tool modes. Fourth, ScanScribe presents a platform for exploiting image analysis and recognition methods to make perceptually significant structure readily available to the user. As a research prototype, ScanScribe has proven useful in the work of members of our laboratory, and has been released on a limited basis for user testing and evaluation.

We present an evaluation of three mouse-based techniques for aligning digital images. We investigate the physical image alignment task and discuss the implications for interacting with virtual images. In a formal evaluation we show that a symmetric bimanual technique outperforms an asymmetric bimanual technique which in turn outperforms a unimanual technique. We show that even after mode switching times are removed, the symmetric technique outperforms the single mouse technique. Subjects also exhibited more parallel interaction using the symmetric technique than when using the asymmetric technique.

PhotoMesa is a zoomable image browser that uses a novel treemap algorithm to present large numbers of images grouped by directory, or other available metadata. It uses a new interaction technique for zoomable user interfaces designed for novices and family use that makes it straightforward to navigate through the space of images, and impossible to get lost.PhotoMesa groups images using one of two new algorithms that lay out groups of objects in a 2D space-filling manner. Quantum treemaps are designed for laying out images or other objects of indivisible (quantum) size. They are a variation on existing treemap algorithms in that they guarantee that every generated rectangle will have a width and height that are an integral multiple of an input object size. Bubblemaps also fill space with groups of quantum-sized objects, but generate non-rectangular blobs, and utilize space more efficiently.

Thumbnail images provide users of image retrieval and browsing systems with a method for quickly scanning large numbers of images. Recognizing the objects in an image is important in many retrieval tasks, but thumbnails generated by shrinking the original image often render objects illegible. We study the ability of computer vision systems to detect key components of images so that automated cropping, prior to shrinking, can render objects more recognizable. We evaluate automatic cropping techniques 1) based on a general method that detects salient portions of images, and 2) based on automatic face detection. Our user study shows that these methods result in small thumbnails that are substantially more recognizable and easier to find in the context of visual search.

The human visual system makes a great deal more of images than the elemental marks on a surface. In the course of viewing, creating, or editing a picture, we actively construct a host of visual structures and relationships as components of sensible interpretations. This paper shows how some of these computational processes can be incorporated into perceptually-supported image editing tools, enabling machines to better engage users at the level of their own percepts. We focus on the domain of freehand sketch editors, such as an electronic whiteboard application for a pen-based computer. By using computer vision techniques to perform covert recognition of visual structure as it emerges during the course of a drawing/editing session, a perceptually supported image editor gives users access to visual objects as they are perceived by the human visual system. We present a flexible image interpretation architecture based on token grouping in a multiscale blackboard data structure. This organization supports multiple perceptual interpretations of line drawing data, domain-specific knowledge bases for interpretable visual structures, and gesture-based selection of visual objects. A system implementing these ideas, called PerSketch, begins to explore a new space of WYPIWYG (What You Perceive Is What You Get) image editing tools.

This paper describes a novel physical icon [3] (“phicon”) based system that can be programmed to issue a range of commands about what the user wishes to do with handdrawn whiteboard content. Through the phicon's UI, a command to process whiteboard context is issued using infrared signaling in combination with image processing and a ceiling-mounted camera system. We leverage camera systems that are already used for capturing whiteboard content [4] by further augmenting these systems to detect the presence and location of IR beacons within an image. An HDLC-based protocol and a built-in IR transmitter are used to send these signals.

Cameras are a useful source of input for many interactive applications, but computer vision programming is difficult and requires specialized knowledge that is out of reach for many HCI practitioners. In an effort to learn what makes a useful computer vision design tool, we created Eyepatch, a tool for designing camera-based interactions, and evaluated the Eyepatch prototype through deployment to students in an HCI course. This paper describes the lessons we learned about making computer vision more accessible, while retaining enough power and flexibility to be useful in a wide variety of interaction scenarios.

We present an evaluation of three mouse-based techniques for aligning digital images. We investigate the physical image alignment task and discuss the implications for interacting with virtual images. In a formal evaluation we show that a symmetric bimanual technique outperforms an asymmetric bimanual technique which in turn outperforms a unimanual technique. We show that even after mode switching times are removed, the symmetric technique outperforms the single mouse technique. Subjects also exhibited more parallel interaction using the symmetric technique than when using the asymmetric technique.

Image retargeting is the problem of adapting images for display on devices different than originally intended. This paper presents a method for adapting large images, such as those taken with a digital camera, for a small display, such as a cellular telephone. The method uses a non-linear fisheye-view warp that emphasizes parts of an image while shrinking others. Like previous methods, fisheye-view warping uses image information, such as low-level salience and high-level object recognition to find important regions of the source image. However, unlike prior approaches, a non-linear image warping function emphasizes the important aspects of the image while retaining the surrounding context. The method has advantages in preserving information content, alerting the viewer to missing information and providing robustness.

ScreenCrayons is a system for collecting annotations on any type of document or visual information from any application. The basis for the system is a screen capture upon which the user can highlight the relevant portions of the image. The user can define any number of topics for organizing notes. Each topic is associated with a highlighting "crayon." In addition the user can supply annotations in digital ink or text. Algorithms are described that summarize captured images based on the highlight strokes so as to provide overviews of many annotations as well as being able to "zoom in" on particular information about a given note and the context of that note.

Image retargeting is the problem of adapting images for display on devices different than originally intended. This paper presents a method for adapting large images, such as those taken with a digital camera, for a small display, such as a cellular telephone. The method uses a non-linear fisheye-view warp that emphasizes parts of an image while shrinking others. Like previous methods, fisheye-view warping uses image information, such as low-level salience and high-level object recognition to find important regions of the source image. However, unlike prior approaches, a non-linear image warping function emphasizes the important aspects of the image while retaining the surrounding context. The method has advantages in preserving information content, alerting the viewer to missing information and providing robustness.