1994 VR Conference Proceedings

Go to previous article 
Go to next article 
Return to the 1994 VR Table of Contents 

Augmented Audio Reality:Design for a Spatial Sound GPS PGS

By: Michael Cohen
Human Interface Lab
University of Aizu 965-80 Japan
37* 29' 1" North, 139* 54' 8" East; 212.5 m.
voice: [+81](242)37-2537
fax: [+81](242)37-2549
E-mail: mcohen@u-aizu.ac.jp


"Augmented reality" describes hybrid presentations that overlay computer-generated imagery on top of real scenes. "Augmented audio reality" extends this notion to include sonic effects, superimposing artificially synthesized sounds on top of a naturally sampled soundscape. Spatial sound is the presentation of audio channels with positional attributes. DSP-synthesized spatial sound, driven by even a simple positional database, can denote directional cues useful to a visually disabled user. Maw (acronymic for multidimensional audio windows) is a NextStep-based audio windowing system deployed as a binaural directional mixing console, capable of presenting such augmented audio reality spatial sound cues. A design for a PGS (personal guidance system), coupled with a GPS (global positioning system), is described, using dynamically selected HRTFs (head related transfer functions) to directionalize arbitrary audio signals. The system is intended to provide capability for wayfinding (as an audio compass or homing beacon), or general sonic cursor or telepointer for team communication.

Keywords: augmented audio reality, global positioning system (GPS), wayfinding, wearable computing, personal guidance system (PGS)


"Augmented reality" [Caudell and Mizell, 1992] [Feiner et al., 1993] is used to describe hybrid presentations that overlay computer-generated imagery on top of real scenes. For example, a wiring schematic might be projected onto see-through goggles, aligned (via head position sensor) with the physical cable ducts, allowing a technician to easily lay wires. Augmented audio reality [Cohen et al., 1993] extends this notion to include sonic effects, overlaying computer-generated sounds on top of more directly acquired audio signals (as in Figure 1). (One common example of augmented audio reality is sound reinforcement, as in a public address system.) We are exploring the alignability of binaural signals with artificially spatialized sources, synthesized by convolving monaural signals with left/right pairs of directional transfer functions. These techniques (algorithms and hardware) can stimulate localization effects of sound sources at an arbitrary position with respect to sinks (listeners). Azimuth, elevation, and distance can all be conveyed by these effects. We are using Maw (acronymic for multidimensional audio windows) [Cohen and Koizumi, 1992] [Cohen, 1993a], a NextStep-based audio windowing system, as a binaural directional mixing console. Since the arrangement of moveable objects (like those shown in Figure 2) is used to dynamically select transfer functions, a user may specify the virtual location of a sound source, throwing the source into perceptual space, using exocentric graphical control to drive egocentric auditory display. As a concept demonstration, we muted a telephone, and then used Maw to spatialize a ringing signal at its location, putting the sonic image of the phone into the office environment. (This recalls [Naimark, 1991]'s visual analog of projecting a picture of a room on the same space after it was painted white.)

Design of a GPS PGS In the style of [Loomis et al., 1993], we hope to extend this system to include provisions for assisting vision-impaired navigation. In the case of blind users, we hope that such a system might be used to augment the user's cane or guide dog. We plan to use a GPS (global positioning system) tracker to monitor outdoor user position and calculate the vector expressing cartesian direction and distance to a reference point. The Aisin GPS [Oguri/Kawamura, 1993] has 2D RMS resolution of 100 m., which should suffice for many ambulatory goals. Its low power consumption (380 mA at 10--16 v DC, manageable weight (850 g.), and tolerable latency will allow pedestrian operation, driven by a battery, and carried in a backpack. As shown in Figure 3, this sensor can be connected to a notebook computer, responsible for programming the sensor and using the user's offset vector to generate appropriate binaural audio cues. In teleconferencing applications, in which users are typically seated, a transaural chair, perhaps with a chair tracker, can be used to support a stereo headset with controllable crosstalk as well as monitor the user's orientation. But for ambulatory users, a more portable HMD (head-mounted [auditory] display) system is needed. We plan to use either "nearphones," headphones placed close to but not touching the user's ears, or open air headphones, acoustically nearly transparent to critical ambient sounds of traffic, voices, etc. The antenna might be mounted in the user's hat. An important goal is not to sacrifice dynamic cues--- soundscape stabilization, the invariance of exocentrically perceived acoustic object location, under head turning.


The technology described above suggests several applications, based on the fact that the user has binaural sensation from real as well as synthetic sources, which might include another pilot's voice, an arbitrary signal deployed as a telepointer (like a sonic cursor [Cohen and Ludwig, 1991] [Cohen, 1993b], audio compass, or homing beacon) or an alarm.

Team Communication

If several users are working together, they will likely want to communicate with each other, transmitting a user's utterance to the others directly, directionalizing to preserve the spatial consistency of the telepresence [Aoki et al., 1992] [Koizumi et al., 1992] [Cohen et al., 1992], as shown in Figure 4. Such a paradigm is also useful in asynchronous applications like voicemail.

Sonic Cursor

Wayfinding: beacon or compass Deployed as part of a PGS (personal guidance system), such a signal can be used as a beacon, signaling a direction. As a navigational cue, a directing voice could emanate from a particular direction, like North (an audio compass) or a destination (like a mythical siren). A natural extension might be to just have the voice emanate from the place where the next action must occur (corner to turn, etc.).

Telepointer: A dynamic telepointer can be used to direct the attention of a user. For instance, to indicate directionality, a direction-giving voice or beacon might move in the direction the walker is meant to turn (the audio analog of some kinds of signal blinkers).

Synesthetic Alarm: Synesthesia is the act of experiencing one sense modality as another, and can be used to further blur the distinction between synthetic and transmitted worlds. A telerobot equipped to enter hazardous areas might have infrared or radiation meters. These sensors could be thresholded, rendering the danger points as auditory alarms, easily superimposed on the auditory soundscape captured by the robot's ears. Of course such a system can be used by real or synthetic navigation tools [Zyda et al., 1994].

Audio Sources: The audio source might be any (broadband) signal, including audio alarms, sound effects, musical themes (like auditory icons [Gaver, 1986] or earcons [Blattner and Greenberg, 1989]), other users' voices, synthesized voice suggestions (eventually as part of an integrated speech dialogue system). Such functionality might be linked into the phone network, cellular phones linked via ISDN or B-ISDN networks [Jolley, 1994].

Maw can adjust gain as a function of distance, so a happy side-effect is that the closeness (perceived proximity) of the audio signal corresponds to the urgency of the action. Alternatively, resetting attenuation creates a distance-independent signal.


Spatial sound interfaces, as manifested by systems like audio windowing, are fertile areas of research into potentially enabling technologies for visually disabled users. Advances in psychoacoustics, allowing simulation of spatial effects via DSP, combined with lighter and cheaper portable computers, make feasible a PGS. Further synergetic effects are obtained by eventual leverage off emerging ubiquity of telecommunication networks (more bandwidth [like high-fidelity audio]) and GIS (geographic information system) databases, accessing terrain or street data.


Figure 1 Augmented audio reality (grey sections indicate future research)

Figure 2 Audio window screenshot: Top-down view of virtual room, with control panels and menus.
Figure 3 Design of a GPS PGS

Figure 4 Geographic positions-- 1 (wave) in Sapporo, 2 (brick) in Tokyo, 3 (lattice) in Hiroshima

Go to previous article 
Go to next article 
Return to the 1994 VR Table of Contents 
Return to the Table of Proceedings 

Reprinted with author(s) permission. Author(s) retain copyright.