2001 Conference Proceedings

Go to previous article 
Go to next article 
Return to 2001 Table of Contents


AN "OBLIQUE" LISTENING METHOD
FOR THE BLIND AND VISUALLY IMPAIRED

Shinichi Torihara
IBM Research, Tokyo Research Laboratory
Keio University, Media and Governance
1623-14 Shimotsuruma Yamato, Kanagawa, Japan 242-8502
E-mail torihara@jp.ibm.com 

1.Introduction

Normally-sighted people can increase their scanning and reading ability by a method known as "diagonal" reading. The blind and visually impaired, however, must rely on a conversion of written text to speech. The problem with this is that speech sounds generated from text are sequential and linear. If you listen to speech sounds at a faster than normal speed, you will not be able to understand them well. In this study, we propose a rapid "oblique" listening method for the blind and visually impaired by controlling speed based upon the linguistic information (syntax, new and old information) of any given text. Verbs, nouns and negative adverbs, and new information will be played at relatively slower speed, the remaining parts at a much faster speed. The need for the visually impaired for improved access to widespread interdisciplinary knowledge is obvious. This method will help them share the advantages of "diagonal" reading currently available only to the fully sighted.

2.Conventional Methods and their Problems

2.1 Fast Playback Mode

A function, "Fast Playback Mode" is found in IBM Home Page Reader(HPR) and Video Cassette Recorders(VCR). The function is realized by physically constant decimation of speech sounds. The faster, HPR, an Internet speech browser for the blind and visually impaired, speaks, the more difficult they would have to understand. In HPR, female sounds indicate "links". They can skip over the contents to the previous or next link by the fast playback function.This playback mode functions as a fast-forward/rewind in playing a tape, not fast listening.

2.2 Text Summarization

Texts are summarized by detecting frequently occurred words (keywords), verbs and adverbs that lead into conclusion, choosing the sentences in texts and connecting the chosen sentences smoothly. The blind and visually impaired can listen to the summerized texts. This method is excellent to grasp the summary. They, however, are eager to listen to the whole efficiently.

3. "Oblique" Listening Method

3.1 Purpose

Reading consists of the following processes.
(1) Scanning.
(2) Reading.
(3) Mastering.
In this paper, we will set focus on the process, "scanning". That is, the blind and visually impaired would find out and figure out about what and where an author wrote by reading (listening) through the whole.

3.2 Realization by the Control of Speed

First, we will detect more important portions and less important portions in each sentence by a certain way. The important words will be played at a relatively slower speed, and the other words at a much faster speed. Less important words or redundant words should be spoken out because human beings can complement and predict that portions even though those words are played at an extremely faster speed.

3.3 Parts of Speech

There is an underlying "predicate logic" in the natural languages. "Predicate(verb)" and its arguments(nouns) express logic. Therefore, verb, noun and negative adverb are important in parts of speech. An example for this will be shown below.

John killed Mary with a stone.
(Surface Structure)

kill: verb; NP1 (AGENT) NP2(PATIENT)
(Argument Structure in Natural Language)

3.4 New Information and Old Information

In determining the importance of words, we might use the concept, "new/old information". Speaker and Hearer have common knowledge, which is called "old information". The information that Speaker only knows and Hearer has not known yet is "new information". The following is an example for the concept.

Mary(new) is a friend(new) of mine.

She(old) is beauty(new).

4. Prototyping and Test

We developed a prototype for demonstrating our rapid "oblique" listening method by controlling the speech speed based upon syntactic information. The next table shows the relation between parts of speech and speech speed.

Table 1: Parts of Speech and Speed

Parts of Speech

Speed

Verb, Noun, Negative Adverb

Slower

Others

Faster


The following table shows the efficiency of our method. A wav filename "250500.wav" means that important words are played at the speed, 250 words per minute and the others at the speed, 500 words per minute.

Table 2: Comparison of Playtime

Filename

Play time

Shorten time

Efficiency

Delta

org.wav

0:05:05

 

 

 

200.wav

0:04:27

0:00:38

12.46%

12.46%

200500.wav

0:03:34

0:00:53

29.84%

17.38%

250.wav

0:03:33

0:00:01

30.16%

0.32%

250500.wav

0:02:58

0:00:35

41.64%

11.48%

300.wav

0:02:56

0:00:02

42.30%

0.66%

300500.wav

0:02:33

0:00:23

49.84%

7.54%

350500.wav

0:02:17

0:00:16

55.08%

5.25%

400.wav

0:02:13

0:00:04

56.39%

1.31%

400500.wav

0:02:04

0:00:09

59.34%

2.95%

500.wav

0:01:46

0:00:18

65.25%

5.90%

5. Conclusion

The blind and visually impaired are listening to speech sounds generated by text-to-speech while they are awake. They tend to listen to the whole text linearly and sequentially, and at an extremely faster speed. We proposed a rapid "diagonal" listening method as normally-sighted people read diagonally, by controlling the speed based upon linguistic information. The blind and visually impaired need semantic processing system, not physical decimation one to understand the text contents efficiently.

6. Further Information

You may listen to the sound files generated by our system.
http://www.torihara.com/research/index.html


Go to previous article 
Go to next article 
Return to 2001 Table of Contents 
Return to Table of Proceedings


Reprinted with author(s) permission. Author(s) retain copyright.