2001 Conference Proceedings
Go to previous article
Go to next article
Return to 2001 Table of Contents
CONSIDERATIONS FOR USER INTERACTION WITH TALKING SCREEN
READERS
Paul Blenkhorn and Gareth Evans
Centre for Rehabilitation Engineering, Speech and Sensory
Technology (CRESST)
Department of Computation, UMIST
PO Box 88, Manchester, M60 1QD, United Kingdom.
p.blenkhorn@co.umist.ac.uk
Introduction
Talking screen readers are widely used by blind people to access
modern 'windowed' Graphical User Interfaces (GUIs). In many
respects the operation of these systems is standardized. However,
there are a number of areas in which there are alternative
methods of providing the user with access to information and
methods by which additional information can be given. This paper
examines some of the pertinent user interaction issues in the
context of a new, low cost; Windows screen reader called LookOUT,
which has been developed by the authors.
There are three principles of screen reader design that are
widely accepted. These are:
- The screen reader should support quick and effective working
achieving the goal of 'maximum information, minimum speech'.
- The speech synthesizer should respond quickly and produce
high-speed speech messages.
- Screen readers should be designed with the goal that a
standard application accessed through the screen reader should
appear, to the blind user, as if it were a specialized talking
application written especially for blind users.
Many commercially available screen readers are very effective and
go a long way toward achieving these goals. However, there some
areas that warrant further investigation and discussion. These
include:
- The provision of additional information that allows a user to
easily obtain formatting and positional information. For example,
when reading or writing a document using a word processor the
user may wish to know where the cursor/caret is placed in
relation to the window's margins and the 'character type' (e.g.
upper or lower case, space or punctuation) of the character under
the cursor.
- The concept of maximum information, minimum speech varies
from user to user. Users have different levels of expertise in
using the operating system and novices require greater amounts of
information. Users have different preferences for the amount and
type of information spoken. Users need to be supported by having
actions confirmed in a sensible way and also need to be informed
of changes on the screen in a way that is appropriate to their
requirements.
- The interfaces to standard applications need to be configured
so that it meets the goal of making the application appear as if
it were a specialized application designed for blind users. There
are a number of issues. These include: using screen markers
and/or scripts; the use of scripts to produce interfaces for the
blind user that are radically different to those provided by the
application; and the trying to minimize the number of esoteric
keystroke combinations that a blind user has to remember.
Each of these areas is considered in more detail below with
particular reference to the way the issues are addressed in the
LookOUT screen reader.
Additional Information
When a user moves the cursor around the screen, using the cursor
keys, most screen readers speak information appropriate to the
action. Consider reading a document with in a word processor,
when the up and down cursor keys are used the complete line is
spoken. When the left and right cursor keys are used, the next
character is spoken. When the control key used with the left and
right cursor keys movement is from word to word, and in this case
the new word is spoken. These modes are well established and give
good access to the text. However, they do not indicate to the
user, the position of the cursor on the screen, which can be
useful for determining layout of a document. Moreover, they do
not indicate to the user the type of character under the cursor.
The screen reader could easily speak this position and character
type information, but this would compromise the maxim of 'maximum
information, minimum speech'. In LookOUT this position and
character type is optionally provided by tones that are played at
the same time as the speech.
LookOUT takes advantage of a modern sound card's ability to play
wave and MIDI information at the same time. The MIDI channel is
used for the tones. To allow the user to distinguish between
vertical and horizontal movement different MIDI instruments are
used. When the user is familiar with the tones, he/she can
determine screen position and action (horizontal or vertical
movement) quite easily.
The modes for position are:
- When the user presses the up or down cursor key a tone is
played whose pitch is proportional to the vertical position of
the cursor on the screen. Thus, as the user 'cursors up' and
ascending musical scale is played.
- When the user presses the left or right cursor key (with
Control) a tone is played whose pitch is proportional to
horizontal position of the cursor on the screen. Thus, as the
user 'cursors right' and ascending musical notes are played. This
makes it very easy for the user to know when he/she moves to a
new line - the pitch drops.
LookOUT can also use tones to indicate 'character type' in
parallel with speech. In this case different tones are used to
indicate different character types and the user can quickly
review the document.
This approach can be extended to give additional information;
for example, it can be used to indicate whether a character is
bold, underlined, and italic by changing the instrument
accordingly. It could also be augmented to indicate changes in
font, with different fonts being represented by different
instruments.
Another approach to giving character type information in
parallel with speech is to use 'force feedback' joysticks or
mice, which will allow the user to 'feel' the format of the
document. It is believed that considerable research and
evaluation work is necessary before this type of interface
becomes commercially available.
Varying the Amount of Speech
Satisfying the concept of 'maximum information, minimum speech'
can vary dependent on the experience of the user. They key issue
is how familiar the user is with the operating system. For
example, when presented with a check box in Windows the
experienced user needs to know the text associated with the
checkbox, the fact that is a checkbox (rather than say a radio
button) and its status (whether it is checked or not). The novice
user will also wish to know the type of operation he can perform
with the interface element. For example, in the case of a
checkbox he/she needs to know that the status of the checkbox can
be changed by the space bar. LookOUT supports two modes of
operation, standard, which assumes that use is familiar with
Windows, and novice, which a much more verbose mode that explains
the options for each possible operation. For a novice user,
LookOUT first speaks the information that is presents in the same
circumstances to a standard user before providing the additional
information. This strategy is adopted to support a novice user's
migration to a standard user.
When a user types, most screen readers speak support either
character or word echo, the user decides which. LookOUT, in
addition to supporting these modes, provides a character and word
echo mode. So that as the user types the characters are spoken,
when the space bar is hit, the word is spoken. This mode has been
provided for novice users and was prompted by the experience of
teachers and trainers.
Speech is sometimes required to confirm a user's action. For
example, the Insert key toggles the mode of operation for a word
processor between inserting test at the caret and overtyping
text. The user needs to be aware that he/she has hit the key,
he/she also needs to be aware of the mode that has been entered.
Of course, other keys can modify the operation of the Insert key,
for example the Shift key. If Shift and Insert are pressed, a
paste operation will be performed and the user is informed of the
paste operation and the text that is to be pasted is read.
Finally, the user needs to be aware of screen changes,
especially new windows that appear. New windows may appear due to
user action (for example, when the user has entered the command
to save a document) or due to system messages (such as lost
network connection). LookOUT tries to deal with the different
types of windows in an appropriate manner. When a window results
form a user action, its title is read and the user can then use
the 'screen review' keys to further investigate the window. When
a system message appears, it reads the whole window. LookOUT
distinguishes between user prompted windows and systems messages
based on size. A window smaller than a certain size is deemed to
be a system message.
Configuration
As stated above, on goal when designing a screen reader can be to
make any standard application appear to be a specialized
application that was written for blind users. An important issue
to be address, therefore, is that given a particular application,
how can the screen reader be configured to give this illusion?
Screen readers typically use two approaches to this problem. One
is to use screen markers and the other is to use scripts.
A screen marker is a location or area of the screen that is
associated with a particular combination of keys. When the user
presses the key combination, the cursor is moved to the location.
Given an application screen markers can be set up relatively
easily. This can be achieved either by the blind user (exploring
the screen using his/her screen reader in screen reading mode) or
by a sighted colleague or friend. Thus, for an application that
is not supported by the screen reader, a relatively unskilled
person can develop a reasonably efficient interface. However,
markers only examine appropriate places on the screen and, whilst
useful, they do not wholly fulfill the goal of making the
application appear to be a specialized talking application.
LookOUT supports the use of markers, and markers can be saved for
subsequent use with an application.
One can view scripting as a more general method of specializing
the behavior of a screen reader so that it supports a particular
application. Scripts are written in a programming language and
are loaded together with an application. When the user interacts
with an application through the keyboard, code in the script is
executed before information is passed to the screen reader. Thus,
the functionality of the screen reader can be adapted for a
particular application. The big advantage of this approach is
that, given a well-written script, the standard application can
really appear as if it were written for a blind user. In effect,
the interface has been rewritten. Functionality far in excess of
simply reading the screen can be incorporated. For example,
LookOUT's Microsoft Excel script distinguishes between numbers
and formulae. A cell with a number will have its row column
location and then its value read, whereas a cell with a formula
will identify the location, the result and the formula used to
compute the results. If the cell has a comment, this will also be
read. Another example of a LookOUT script is the one that is used
to control the CD player in Windows. Graphically this has command
buttons that allow the player to be started, stopped, paused,
etc. This interface is completely remapped to the numeric
keyboard by the LookOUT script, which uses the '4' key to start,
the '5' key to stop, etc. In this way a completely new interface
is created. Script files can also contain help information that
explains the interface.
The problem with scripts is that scripting is relatively
complicated and unless the vendor supplies a script, few people
have the expertise to write new scripts. This is problem is
partially ameliorated in LookOUT by using Microsoft Visual Basic
Script rather than a proprietary scripting language. Because
Visual Basic is a very widely used programming language, it is
thought that a reasonably competent programmer can develop new
scripts.
As a final point, a number of blind people are employed in 'call
sites' where they interact with customers via the telephone and a
computer system using a screen reader. The software used by call
sites can be complex and require an operator to navigate a large
number of complex screens. This poses a problem for the screen
reader user who needs to locate information and screen quickly
and efficiently. Scripting can alleviate these problems by
providing an alternative interface that allows the user to
reference forms and fields directly through the keyboard.
However, given that call site software may be very complex, there
may be a large number of operations that an operator needs to
perform. Generally, scripts re-map interfaces to keystroke
combinations, however with many operations remembering the
keystroke combinations can be difficult. LookOUT supports the
idea of 'script strings' this allows memorable command names
rather than esoteric keystroke chains to be associated with
operations and should make operation much easier.
As an aside, it should be noted that call sites may be based on
platforms other than Windows. However, in this case the blind
user can use a terminal emulator running on a Windows PC together
with his/her screen reader.
Go to previous article
Go to next article
Return to 2001 Table of Contents
Return to Table of
Proceedings
Reprinted with author(s) permission. Author(s) retain copyright.