Although mobile, tablet, large display, and tabletop computers increasingly present opportunities for using pen, finger, and wand gestures in user interfaces, implementing gesture recognition largely has been the privilege of pattern matching experts, not user interface prototypers. Although some user interface libraries and toolkits offer gesture recognizers, such infrastructure is often unavailable in design-oriented environments like Flash, scripting environments like JavaScript, or brand new off-desktop prototyping environments. To enable novice programmers to incorporate gestures into their UI prototypes, we present a "$1 recognizer" that is easy, cheap, and usable almost anywhere in about 100 lines of code. In a study comparing our $1 recognizer, Dynamic Time Warping, and the Rubine classifier on user-supplied gestures, we found that $1 obtains over 97% accuracy with only 1 loaded template and 99% accuracy with 3+ loaded templates. These results were nearly identical to DTW and superior to Rubine. In addition, we found that medium-speed gestures, in which users balanced speed and accuracy, were recognized better than slow or fast gestures for all three recognizers. We also discuss the effect that the number of templates or training examples has on recognition, the score falloff along recognizers' N-best lists, and results for individual gestures. We include detailed pseudocode of the $1 recognizer to aid development, inspection, extension, and testing.
We describe OctoPocus, an example of a dynamic guide that combines on-screen feedforward and feedback to help users learn, execute and remember gesture sets. OctoPocus can be applied to a wide range of single-stroke gestures and recognition algorithms and helps users progress smoothly from novice to expert performance. We provide an analysis of the design space and describe the results of two experi-ments that show that OctoPocus is significantly faster and improves learning of arbitrary gestures, compared to con-ventional Help menus. It can also be adapted to a mark-based gesture set, significantly improving input time compared to a two-level, four-item Hierarchical Marking menu.
We present Bonfire, a self-contained mobile computing system that uses two laptop-mounted laser micro-projectors to project an interactive display space to either side of a laptop keyboard. Coupled with each micro-projector is a camera to enable hand gesture tracking, object recognition, and information transfer within the projected space. Thus, Bonfire is neither a pure laptop system nor a pure tabletop system, but an integration of the two into one new nomadic computing platform. This integration (1) enables observing the periphery and responding appropriately, e.g., to the casual placement of objects within its field of view, (2) enables integration between physical and digital objects via computer vision, (3) provides a horizontal surface in tandem with the usual vertical laptop display, allowing direct pointing and gestures, and (4) enlarges the input/output space to enrich existing applications. We describe Bonfire's architecture, and offer scenarios that highlight Bonfire's advantages. We also include lessons learned and insights for further development and use.
We present the design of Lineogrammer, a diagram-drawing system motivated by the immediacy and fluidity of pencil-drawing. We attempted for Lineogrammer to feel like a modeless diagramming "medium" in which stylus input is immediately interpreted as a command, text label or a drawing element, and drawing elements snap to or sculpt from existing elements. An inferred dual representation allows geometric diagram elements, no matter how they were entered, to be manipulated at granularities ranging from vertices to lines to shapes. We also integrate lightweight tools, based on rulers and construction lines, for controlling higher-level diagram attributes, such as symmetry and alignment. We include preliminary usability observations to help identify areas of strength and weakness with this approach.
New input technologies (such as touch), recognition based input (such as pen gestures) and next-generation interactions (such as inexact interaction) all hold the promise of more natural user interfaces. However, these techniques all create inputs with some uncertainty. Unfortunately, conventional infrastructure lacks a method for easily handling uncertainty, and as a result input produced by these technologies is often converted to conventional events as quickly as possible, leading to a stunted interactive experience. We present a framework for handling input with uncertainty in a systematic, extensible, and easy to manipulate fashion. To illustrate this framework, we present several traditional interactors which have been extended to provide feedback about uncertain inputs and to allow for the possibility that in the end that input will be judged wrong (or end up going to a different interactor). Our six demonstrations include tiny buttons that are manipulable using touch input, a text box that can handle multiple interpretations of spoken input, a scrollbar that can respond to inexactly placed input, and buttons which are easier to click for people with motor impairments. Our framework supports all of these interactions by carrying uncertainty forward all the way through selection of possible target interactors, interpretation by interactors, generation of (uncertain) candidate actions to take, and a mediation process that decides (in a lazy fashion) which actions should become final.
Although mobile, tablet, large display, and tabletop computers increasingly present opportunities for using pen, finger, and wand gestures in user interfaces, implementing gesture recognition largely has been the privilege of pattern matching experts, not user interface prototypers. Although some user interface libraries and toolkits offer gesture recognizers, such infrastructure is often unavailable in design-oriented environments like Flash, scripting environments like JavaScript, or brand new off-desktop prototyping environments. To enable novice programmers to incorporate gestures into their UI prototypes, we present a "$1 recognizer" that is easy, cheap, and usable almost anywhere in about 100 lines of code. In a study comparing our $1 recognizer, Dynamic Time Warping, and the Rubine classifier on user-supplied gestures, we found that $1 obtains over 97% accuracy with only 1 loaded template and 99% accuracy with 3+ loaded templates. These results were nearly identical to DTW and superior to Rubine. In addition, we found that medium-speed gestures, in which users balanced speed and accuracy, were recognized better than slow or fast gestures for all three recognizers. We also discuss the effect that the number of templates or training examples has on recognition, the score falloff along recognizers' N-best lists, and results for individual gestures. We include detailed pseudocode of the $1 recognizer to aid development, inspection, extension, and testing.
Internet usage on mobile devices continues to grow as users seek anytime, anywhere access to information. Because users frequently search for businesses, directory assistance has been the focus of many voice search applications utilizing speech as the primary input modality. Unfortunately, mobile settings often contain noise which degrades performance. As such, we present Search Vox, a mobile search interface that not only facilitates touch and text refinement whenever speech fails, but also allows users to assist the recognizer via text hints. Search Vox can also take advantage of any partial knowledge users may have about the business listing by letting them express their uncertainty in an intuitive way using verbal wildcards. In simulation experiments conducted on real voice search data, leveraging multimodal refinement resulted in a 28% relative reduction in error rate. Providing text hints along with the spoken utterance resulted in even greater relative reduction, with dramatic gains in recovery for each additional character.