2002 Conference Proceedings

Go to next article 
Return to 2002 Table of Contents


Shinichi Torihara
IBM Research, Tokyo Research Laboratory
Keio University, the Graduate School of Media and Governance
E-mail: torihara@jp.ibm.com

1. Introduction

Even persons with a severe visual disability can walk alone freely without using a white cane or a guide-dog if they are familiar with their surroundings. They can move from one area to another, and are able to process multiple tasks simultaneously such as listening to the radio, watching television and talking with friends over a cup of coffee. This is because they use all of their senses except sight. On the other hand, they feel handicapped or a disadvantage when they sit in front of a PC. Current screen readers are using monaural, consecutive sounds in a rather linear way by ignoring the positions of information. In this paper, we will propose a 2D (height and width) sound screen reader by switching or selecting the speaker(s) to which the information belongs out of the speaker-matrix. That is to say, we can obtain information from its speaker's position by text-to-speech. The persons with low vision can find the mouse pointer very easily. Blind persons would be able to increase the use of their other senses like they do in familiar places. We consider that our approach is one step for cognitive fundamental research on auditory virtual reality for the blind and visually impaired.

2. Conventional Screen Reader

2.1 Monaural Sound of Speech

Although every PC has a stereo speaker system, most screen readers are spoken in a monaural sound. These monaural sound screen readers have a tendency to represent Microsoft Windows environment in a consecutive, rather linear way. The systems ignore any information of visual X-Y coordinate on a PC screen. Their limitations are clear because they reduce two-dimensional world down to one channel of sound.

2.2 Stereo Sound of Speech

As far as we know, "World One" is a unique screen reader that represents the Windows environment in a stereo form of speech [1]. Tables, lists and the streams of texts are explained in right, mid, left directions. In comparison with monaural sound screen readers, "World One" has an advantage in expressing right-mid-left location-dependent information. We can easily imagine the understandability of the explanation of a table in monaural sound of speech versus a table in stereo sound of speech. The system, however, represents in a left and right direction, ignoring that of height.

3. 2D Sound Screen Reader

3.1 2D Sound Card and Four Speaker System

We made an attempt to use multiple two-channel sound cards. We were not sucessful, because we could not obtain the exactly synchronized response from each sound card (each speaker). Therefore, we will propose a four-channel sound card and four speaker system for reading a screen in the following way (See fig.1.)

(1) Front L-channel speaker as top left speaker.
(2) Front R-channel speaker as top right speaker.
(3) Rear L-channel speaker as bottom left speaker.
(4) Rear R-channel speaker as bottom right speaker.

The sound levels from each speaker are adjusted so that the mixed sound can reproduce the location of the sound source. We would be able to provide a screen reader with the speech sound of four corner directions. We would like to exemplify the usefulness of this system by Start menu of task bar.

(1) Press Ctrl + Esc. (You will hear "start menu" from bottom-left corner.)
(2) Press up cursor. (You will hear "shut down" from bottom-left corner.)
(3) Continue to press up cursor. (You will hear "programs" from center-left side.)
(4) Press right cursor (You will hear "accessories" from center.)

We consider this system would be practical and with a feasible range in developing for general blind users.

Diagram of four speaker system.

3.2 2D Sound Card and Speaker-Matrix

For our cognitive research on auditory virtual reality for the blind and visually impaired, we have just started the design for "audible display" by a four-channel sound card and a speaker-matrix on a 100-inch board and speaker switching hardware (See fig.2.)

(1) Speaker-matrix consists of "mmm speakers" x "nnn speakers". This is two-channel (L-channel, R-channel) system. That is to say, left side is for L-channel and right side is for R-channel.
(2) Two switching hardware devices. The one is for L-channel and the other is for R-channel respectively.
(3) Front L-channel speaker-out to L-channel speaker switching hardware device for L-channel audio output.
(4) Rear L-channel speaker to L-channel speaker switching hardware device for the control of selecting speakers.
(5) Front R-channel speaker-out to R-channel speaker switching hardware device for R-channel audio output.
(6) Rear R-channel speaker-out R-channel speaker switching hardware device for the control of selecting speakers.

This is a two-dimension speaker-matrix like visual dot matrix of a monitor screen. The selected speakers can reproduce the exact location of the sound source. This "audible display" would be able to provide more precise distinction with blind people in X-Y coordinate. Weakly sighted people would find mouse cursor very easily. And blind people would understand tables, lists and stream of texts (even our previous research in CSUN2001: An "Oblique" Listening Method for the Blind and Visually Impaired" [2].)

Diagram of two speaker matrix.

4. Conclusion

As far as a speaker-matrix screen reader or "audible display" is concerned, we are just in the designing and prototyping phase. We will aim at audible, touchable (tactile) and executable (double touch) and visible large projector. Visually disabled people in front of our system would make access to the PC world as they do in real familiar places.


[1] http://www.aagi.com/. 
[2] http://www.torihara.com/research/.

Go to next article 
Return to 2002 Table of Contents 
Return to Table of Proceedings

Reprinted with author(s) permission. Author(s) retain copyright.