2003 Conference Proceedings

Go to previous article 
Go to next article 
Return to 2003 Table of Contents 


Bronstad, P.M., Lewis, K., Slatin, J.
The University of Texas at Austin
Institute for Technology and Learning
Email: bmatt@mail.utexas.edu


The world wide web is rapidly becoming an indispensible resource, which holds great potential for people with visual impairments. Navigating html documents and getting information from them, however, is often very taxing for people who use screenreaders. People with visual impairments must listen to regular text and incorporated information about navigational and structural elements. All of this information must be properly interpreted in order to form a correct mental model of a document.

For example, screenreaders often label hyperlinks by preceding them with the word "link," which can add to the difficulty of the task; the word is used to communicate something about the text that follows, although it could be interpreted as another word in a sentence, thereby increasing the complexity of the task.

Researchers have investigated the effectiveness of aural elements such as "earcons" (Brewster, Wright, & Edwards, 1992) and other non-speech audio cues (Frankie, 1997). We investigated whether the use of non-speech audio cues can reduce cognitive workload to the extent that users can perform very complex tasks that they would otherwise find impossible.


We designed and piloted an experiment to test whether individuals can better discriminate words identified by the word "link" or words identified by a non-speech sound.

Participants listen to a digitized voice speak a string of digits. Some of the digits are randomly assigned to be "hyperlinks." Hyperlinks are preceded by the word "link" or are marked by a tone that plays concurrent with the word. Participants indicate which digits were hyperlinks by typing those numbers into a keyboard and pressing enter. We instructed participants to ignore non-hyperlinks. For example, if participant hears, "five link one four nine," the correct response is, "1."

In one condition the links are preceded by the word "link," in a second condition they are identified by a tone that plays concurrently with the target digit. Conditions were counterbalanced. We used a staircase procedure in which, after 20 trials, participant performance for the last 20 trials is assessed after each subsequent trial and if it exceeds a criterion (90% successful identification of target links in the previous 20 trials) the complexity of the task is increased by adding a digit to the number string. Participants begin with strings one digit long to learn the task.

We used Matlab 6.5 to randomize and present stimuli and record participant responses.


Pilot data indicate that in either condition links and nonlinks are almost equally discriminable (we used the d-prime statistic to assess discriminability of targets). D-prime is calculated by comparing hit rates to false alarm rates, assuming equal variance and a normal distribution to the properties of the stimulus categories. Discriminability decreased as the number of words per trial increased.

Analysis of number of trials by word condition revealed that participants progressed most rapidly in the tone condition, compared to the condition in which links were identified with spoken cues (i.e., "link").

Post-experimental interviews with participants suggest that strategies for dealing with links identified by spoken cues require intense concentration but can be verbally articulated. Strategies developed to deal with links indicated by tones involve perception of rhythm and some intuitive sense that cannot be articulated easily.


Our experiment demonstrates one way to reduce cognitive load on screenreader users. In our experiment we push the concept of using spoken cues about contextual information to the point at which it creates task interference. The difficulty is considerably less, however, in the condition in which information is conveyed by a tone played concurrently.


Brewster, S.A., Wright, P.C., & Edwards, A.D.N. (1992). A detailed investigation into the effectiveness of earcons. In G. Kramer (Ed.), Auditory display, sonification, audification and auditory interfaces. The Proceedings of the First International Conference on Auditory Display, Santa Fe Institute, Santa Fe, NM: Addison-Wesley, pp. 471-498.

James, F. (1996). Presenting HTML Structure in Audio: User Satisfaction with Audio Hypertext. Proceedings of the International Conference on Auditory Display (ICAD), 97-103.

Go to previous article 
Go to next article 
Return to 2003 Table of Contents 
Return to Table of Proceedings

Reprinted with author(s) permission. Author(s) retain copyright.