

Distributed client/server models are becoming increasingly prevalent in multimedia systems and advanced user interface design. A multimedia application, for example, may play and record audio, use speech recognition input, and use a window system for graphical I/O. The software architecture of such a system can be simplified if the application communicates to multiple servers (e.g., audio servers, recognition servers) that each manage different types of input and output. This paper describes tools for rapidly prototyping distributed asynchronous servers and applications, with an emphasis on supporting highly interactive user interfaces, temporal media, and multi-modal I/O.
The Socket Manager handles low-level connection management and device I/O by supporting a callback mechanism for connection initiation, shutdown, and for reading incoming data. The Byte Stream Manager consists of an RPC compiler and run-time library that supports synchronous and asynchronous calls, with both a programmatic interface and a telnet interface that allows the server to act as a command interpreter. This paper details the tools developed for building asynchronous servers, several audio and speech servers built using these tools, and applications that exploit the features provided by the servers.

Impromptu is a mobile audio device which uses wireless Internet Protocol (IP) to access novel computer-mediated voice communication channels. These channels show the richness of IP-based communication as compared to conventional mobile telephony, adding audio processing and storage in the network, and flexible, user-centered call control protocols. These channels may be synchronous, asynchronous, or event-triggered, or even change modes as a function of other user activity. The demands of these modes plus the need to navigate with an entirely non-visual user interface are met with a number of audio-oriented user interaction techniques.

This paper describes a new music-playback interface for trial listening, SmartMusicKIOSK. In music stores, short trial listening of CD music is not usually a passive experience -- customers often search out the chorus or "hook" of a song using the fast-forward button. Listening of this type, however, has not been traditionally supported. This research achieves a function for jumping to the chorus section and other key parts of a song plus a function for visualizing song structure. These functions make it easier for a listener to find desired parts of a song and thereby facilitate an active listening experience. The proposed functions are achieved by an automatic chorus-section detecting method, and the results of implementing them as a listening station have demonstrated their usefulness.

ENO is an audio server designed to make it easy for applications in the Unix environment to incorporate non-speech audio cues. At the physical level, ENO manages a shared resource, namely the audio hardware. At the logical level, it manages a sound space that is shared by various client applications. Instead of dealing with sound in terms of its physical description (i.e., sampled sounds), ENO allows sounds to be presented and controlled in terms of higher-level descriptions of sources, interactions, attributes, and sound space. Using this structure, ENO can facilitate the creation of consistent, rich systems of audio cues. In this paper, we discuss the justification, design, and implementation of ENO.

A system that uses an ear proximity sensor to actively manage periods of distraction during telephone conversations is described. We detect when the phone is removed from the ear, record any incoming audio, and play it back when the phone is returned to the ear. By dropping silent intervals and speeding up playback with a pitch-preserving algorithm, we quickly return to real-time without the loss of information. This real-time audio buffering technique also allows us to create a user-activated, lossless instant replay function.