

We describe the use of non-verbal features in voice for direct control of interactive applications. Traditional speech recognition interfaces are based on an indirect, conversational model. First the user gives a direction and then the system performs certain operation. Our goal is to achieve more direct, immediate interaction like using a button or joystick by using lower-level features of voice such as pitch and volume. We are developing several prototype interaction techniques based on this idea, such as "control by continuous voice", "rate-based parameter control by pitch," and "discrete parameter control by tonguing." We have implemented several prototype systems, and they suggest that voice-as-sound techniques can enhance traditional voice recognition approach.

We introduce the Boom Chameleon, a novel input/output device consisting of a flat-panel display mounted on a tracked mechanical boom. The display acts as a physical window into 3D virtual environments, through which a one-to-one mapping between real and virtual space is preserved. The Boom Chameleon is further augmented with a touch-screen and a microphone/speaker combination. We present a 3D annotation application that exploits this unique configuration in order to simultaneously capture viewpoint, voice and gesture information. Design issues are discussed and results of an informal user study on the device and annotation software are presented. The results show that the Boom Chameleon annotation facilities have the potential to be an effective, easy to learn and operate 3D design review system.

IVR (interactive voice response) menu navigation has long been recognized as a frustrating interaction experience. We propose an IM-based system that sends a coordinated visual IVR menu to the caller's computer screen. The visual menu is updated in real time in response to the caller's actions. With this automatically opened supplementary channel, callers can take advantages of different modalities over different devices and interact with the IVR system with the ease of graphical menu selection. Our approach of utilizing existing network infrastructure to pinpoint the caller's virtual location and coordinating multiple devices and multiple channels based on users' ID registration can also be more generally applied to create integrated user experiences across a group of devices.