2000 Conference Proceedings

Go to previous article 
Go to next article 
Return to 2000 Table of Contents


Document Reader for the Blind

Chieko Asakawa, Hironobu Takagi and Takashi Itoh
chie@jp.ibm.com, takagih@jp.ibm.com and JL03313@jp.ibm.com 
IBM Japan Ltd., Tokyo Research Laboratory
1624-14, Shimotsuruma, Yamato-shi, Kanagawa-ken 242-8502, Japan

Personal computers have played an important role in enabling the blind to access printed documents. Blind users can communicate with others by e-mail and can read and write electronic documents by using screen readers and other assistive technology software. These days, however, electronic documents are becoming increasingly visual, with various fonts, colors, and illustrations. In addition, their layouts are becoming more complex, since authors are paying more attention to visual appearance. These visual characteristics are making electronic documents increasingly inaccessible to blind users. They have a hard time reading such formatted documents with screen readers, and need to know how to operate a variety of applications just to read documents. Consequently, they tend to ask sighted people to provide documents in plain text format. However, automatic conversion of formatted documents into plain text format sometimes destroys the original logical structure. Sighted people therefore have to spend time checking whether the conversion was successful or not.

Another issue that affects blind users is the difficulty of using a screen reader to access presentation materials created by using software such as Lotus Freelance and MS PowerPoint(R). They are often unable to include visual elements in their own presentations or to follow presentations given by others without assistance, because they cannot access presentation software designed for sighted people.

We therefore decided to develop a document reader to allow blind users to access various types of formatted document through a single user interface. Our system uses an object model of each application, so it can present the contents of a document without regard to its layout. The logical structure of the sentences can also be navigated, so it is easy to detect titles, headings, list items, paragraphs, sentences, and so on.

Since the user interface for navigating through a document is universal, users do not need to operate any word processors or presentation software to read formatted documents. In the following section, we will describe the user interface of our prototype document reader.

Overview of the system The figure above shows the structure of our document reader. Documents should be opened through Explorer. When a document is opened, it is listed in the document selector. The system uses a numeric keypad for command input. When the minus key is pressed, the document selector becomes active. The name of any document in the selector can be announced by pressing the up/down cursor key. When the Enter key is pressed, the document becomes ready for reading. Currently, our prototype system allows users to access MS Word (abb. Word) and Lotus WordPro(TM) (abb. Wordpro) as word processors and MS PowerPoint(R) (abb. Powerpoint) and Lotus Freelance (abb. Freelance) as presentation software.

When a user presses one of the navigation keys on the numeric keypad, the document reading handler communicates with an object model and, after getting the appropriate information through the latter, sends it to a TTS engine. Three TTS engines are now available with the system: Viavoice Outloud, L&H, and ProTalker for Japanese. The engine can be toggled by pressing the enter key followed by the asterisk key. When a braille pin display is connected, the system is also capable of braille output. The enter key can be used to stop the speech output.

Navigation functions for word processors Table 1 shows navigation keys for word processors. Navigation keys that are frequently used for reading documents in various ways, such as paragraph by paragraph, sentence by sentence, word by word, page by page, and heading by heading, are provided for both Word and WordPro(TM).

The functions of each application are somewhat different, because each application has different characteristics and each object model has different functions. Since there is no function for determining the current character in the WordPro object model, the system could not provide a character jump key for WordPro.

Table 1: Navigation keys for MS Word & Lotus WordPro 1 Jump to previous page
2 Read current page
3 Jump to next page
4 Jump to previous sentence/paragraph
5 Read current sentence/paragraph
6 Jump to next sentence/paragraph
7 Jump to previous word
8 Read current word
9 Jump to next word
0 Play from current sentence
Plus + 1 Jump to first page
Plus + 2 Toggle jump mode (page/heading)
Plus + 3 Jump to last page
Plus + 4 Play from top of document
Plus + 5 Toggle jump mode (sentence/paragraph)
Plus + 6 Jump to last sentence/paragraph
Plus + 8 [Word only] Toggle jump mode (word/character)

Navigation functions for presentation software Table 2 shows navigation keys for presentation software. A unique feature of the system is its ability to deal with a slideshow mode. Usually it is very hard for blind presenters to know which slide is currently on the screen, and they have to pay careful attention to avoid making mistakes. This function allows users to control a slide, since it reads out the title of a new slide when it appears on the screen. In this way, it creates a much more user-friendly presentation environment.

Presentation packages are especially difficult to read through screen readers. Our system reads all the text information contained in a slide, just as if it were a text document. The logical structure of a slide can be navigated by using each object model's capabilities.

Table 2: Navigation keys for MS PowerPoint & Lotus Freelance 1 Jump to previous
slide and announce its title
2 Announce title of current slide
3 Jump to next slide and announce its title
4 Jump to previous shape
5 Read current shape
6 Jump to next shape
7 Jump to previous paragraph in shape
8 Read current paragraph
9 Jump to next paragraph in shape
0 Play from current shape
* [PowerPoint] Toggle slideshow mode/normal mode
[Freelance] Slideshow mode (no resume)
Plus + 1 Jump to first slide
Plus + 3 Jump to last slide
Plus + 4 Play from start of slide
Plus + 6 Jump to last shape
Plus + 7 Jump to first paragraph in shape
Plus + 8 [PowerPoint] Toggle jump mode (paragraph/sentence/word/character)
[Freelance] Toggle jump mode (paragraph/character)
Plus + 9 Jump to last paragraph in shape

Shape: a group of graphic objects

Plans The prototype document reader provides a nonvisual universal user interface for four applications. Users do not need to know how to operate each application, and they can read documents formatted for those applications by using only the numeric keypad. This approach will enable even computer novices to read formatted documents quickly and easily.

We will keep studying object models for other applications such as spreadsheets, mail software, databases, and Web browsers, to provide the same user interface for as many applications as possible. We will also add other navigation functions to make it possible to read a document more quickly and precisely by taking advantage of the object model's capabilities.

Our next goal is to provide a universal nonvisual writing method for editing and writing documents formatted for any type of application, such as Word, WordPro(TM), PowerPoint(R), Freelance, Lotus 1-2-3, or MS Excel, including rich text information. Such a method will enable blind users to create visual presentation packages and various kinds of document without any assistance, through the universal user interface. After creating a document, they will be able to check how it looks by using our system's document-reading capabilities.


Go to previous article 
Go to next article 
Return to 2000 Table of Contents 
Return to Table of Proceedings


Reprinted with author(s) permission. Author(s) retain copyright.